0

0

轻量级人像分割模型:SINet 和 ExtremeC3Net

P粉084495128

P粉084495128

发布时间:2025-07-31 13:18:45

|

780人浏览过

|

来源于php中文网

原创

本文介绍SINet和ExtremeC3Net两个轻量级人像分割模型,二者参数分别为0.087M、0.038M,Flop为0.064G、0.128G。可通过PaddleHub快速调用,也能基于PaddleInference推理部署,并给出了Paddle2.0上的实现代码。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

轻量级人像分割模型:sinet 和 extremec3net - php中文网

PixVerse
PixVerse

PixVerse是一款强大的AI视频生成工具,可以轻松地将多种输入转化为令人惊叹的视频。

下载

引入

  • 随着算力和算法的不断提升,能够训练的模型也越来越大了,当然精度也越来越高了
  • 不过过于巨大的模型也带来了部署上的不便
  • 今天就介绍两个轻量级的人像分割模型:SINet 和 ExtremeC3Net

项目说明

  • 项目模型转换至开源项目ext_portrait_segmentation
  • 感谢上述项目提供的开源代码和模型

模型规格

  • 具体的模型规格如下表:
    model Param Flop
    SINet 0.087 M 0.064 G
    ExtremeC3 0.038 M 0.128 G
  • 可以看出这两个模型算是相当轻量的了

效果展示

  • ExtremeC3Net:

轻量级人像分割模型:SINet 和 ExtremeC3Net - php中文网        

  • SINet:

轻量级人像分割模型:SINet 和 ExtremeC3Net - php中文网        

快速使用

  • 按照惯例已经将两个模型封装为PaddleHub Module
  • 可通过PaddleHub进行快速调用
In [1]
!pip install paddlehub==2.0.0b2
   
In [9]
# 导入PaddleHubimport paddlehub as hub# 加载模型# 模型可选:SINet_Portrait_Segmentation 和 ExtremeC3_Portrait_Segmentationmodel = hub.Module(directory='SINet_Portrait_Segmentation')# 人像分割outputs = model.Segmentation(images=None,
                       paths=['00001.jpg'],
                       batch_size=1,
                       output_dir='output',
                       visualization=True)# 结果显示%matplotlib inlineimport cv2import numpy as npimport matplotlib.pyplot as plt

img = np.concatenate([
    cv2.imread('00001.jpg'),
    cv2.cvtColor(outputs[0]['mask'], cv2.COLOR_GRAY2BGR),
    outputs[0]['result']
], 1)
plt.axis('off')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()
       
               
In [3]
# 导入PaddleHubimport paddlehub as hub# 加载模型# 模型可选:SINet_Portrait_Segmentation 和 ExtremeC3_Portrait_Segmentationmodel = hub.Module(directory='ExtremeC3_Portrait_Segmentation')# 人像分割outputs = model.Segmentation(images=None,
                       paths=['00001.jpg'],
                       batch_size=1,
                       output_dir='output',
                       visualization=True)# 结果显示%matplotlib inlineimport cv2import numpy as npimport matplotlib.pyplot as plt

img = np.concatenate([
    cv2.imread('00001.jpg'),
    cv2.cvtColor(outputs[0]['mask'], cv2.COLOR_GRAY2BGR),
    outputs[0]['result']
], 1)
plt.axis('off')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()
       
               

推理部署

  • 除了使用PaddleHub一键调用之外,当然也可以使用推理模型进行推理部署
  • 接下来简单介绍一下如何基于PaddleInference完成推理部署
  • 更多详情可以参考我的另一个项目:PaddleQuickInference
In [4]
# 安装PaddleQuickInference!pip install ppqi -i https://pypi.python.org/simple
   
In [5]
from ppqi import InferenceModelfrom SINet.processor import preprocess, postprocess# 参数配置configs = {    'img_path': '00001.jpg',    'save_dir': 'save_img',    'model_name': 'SINet_Portrait_Segmentation',    'use_gpu': False,    'use_mkldnn': False}# 第一步:数据预处理input_data = preprocess(configs['img_path'])# 第二步:加载模型model = InferenceModel(
    modelpath='SINet/'+configs['model_name'], 
    use_gpu=configs['use_gpu'], 
    use_mkldnn=configs['use_mkldnn']
)
model.eval()# 第三步:模型推理output = model(input_data)# 第四步:结果后处理postprocess(
    output, 
    configs['save_dir'],
    configs['img_path'],
    configs['model_name']
)
   
In [6]
from ppqi import InferenceModelfrom ExtremeC3Net.processor import preprocess, postprocess# 参数配置configs = {    'img_path': '00001.jpg',    'save_dir': 'save_img',    'model_name': 'ExtremeC3_Portrait_Segmentation',    'use_gpu': False,    'use_mkldnn': False}# 第一步:数据预处理input_data = preprocess(configs['img_path'])# 第二步:加载模型model = InferenceModel(
    modelpath='ExtremeC3Net/'+configs['model_name'], 
    use_gpu=configs['use_gpu'], 
    use_mkldnn=configs['use_mkldnn']
)
model.eval()# 第三步:模型推理output = model(input_data)# 第四步:结果后处理postprocess(
    output, 
    configs['save_dir'],
    configs['img_path'],
    configs['model_name']
)
   

模型实现

  • 接下来再介绍一下如何在Paddle2.0上实现这两个模型吧
  • 代码上与原项目的Pytorch代码是相似的
  • 针对框架之间的差异将其中的一些算子做了替换
  • 具体详情请参考下方代码
In [7]
# model/sinet.py'''
ExtPortraitSeg
Copyright (c) 2019-present NAVER Corp.
MIT license
'''import paddleimport paddle.nn as nn

BN_moment = 0.1def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.shape

    channels_per_group = num_channels // groups    # reshape
    x = x.reshape([batchsize, groups,
               channels_per_group, height, width])    # transpose
    x = paddle.transpose(x, [0, 2, 1, 3, 4])    
    # flatten
    x = x.reshape([batchsize, groups*channels_per_group, height, width])    return xclass CBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum=BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass separableCBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Sequential(
            nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False),
            nn.Conv2D(nIn, nOut,  kernel_size=1, stride=1, bias_attr=False),
        )
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass SqueezeBlock(nn.Layer):
    def __init__(self, exp_size, divide=4.0):
        super(SqueezeBlock, self).__init__()        if divide > 1:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, int(exp_size / divide)),
                nn.PReLU(int(exp_size / divide)),
                nn.Linear(int(exp_size / divide), exp_size),
                nn.PReLU(exp_size),
            )        else:
            self.dense = nn.Sequential(
                nn.Linear(exp_size, exp_size),
                nn.PReLU(exp_size)
            )    def forward(self, x):
        batch, channels, height, width = x.shape
        out = paddle.nn.functional.avg_pool2d(x, kernel_size=[height, width]).reshape([batch, channels])
        
        out = self.dense(out)
        out = out.reshape([batch, channels, 1, 1])        return paddle.multiply(out, x)class SEseparableCBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1, divide=2.0):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)

        self.conv = nn.Sequential(
            nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False),
            SqueezeBlock(nIn, divide=divide),
            nn.Conv2D(nIn, nOut,  kernel_size=1, stride=1, bias_attr=False),
        )

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)
        output = self.act(output)        return outputclass BR(nn.Layer):
    '''
        This class groups the batch normalization and PReLU activation
    '''

    def __init__(self, nOut):
        '''
        :param nOut: output feature maps
        '''
        super().__init__()
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        '''
        :param input: input feature map
        :return: normalized and thresholded feature map
        '''
        output = self.bn(input)
        output = self.act(output)        return outputclass CB(nn.Layer):
    '''
       This class groups the convolution and batch normalization
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optinal stide for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum= BN_moment)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)        return outputclass C(nn.Layer):
    '''
    This class is for a convolutional layer.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1,group=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride,
                              padding=(padding, padding), bias_attr=False, groups=group)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass S2block(nn.Layer):
    '''
    This class defines the dilated convolution.
    '''

    def __init__(self, nIn, nOut, config):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        :param d: optional dilation rate
        '''
        super().__init__()
        kSize = config[0]
        avgsize = config[1]

        self.resolution_down = False
        if avgsize >1:
            self.resolution_down = True
            self.down_res = nn.AvgPool2D(avgsize, avgsize)
            self.up_res = nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=avgsize)
            self.avgsize = avgsize

        padding = int((kSize - 1) / 2 )
        self.conv = nn.Sequential(
                        nn.Conv2D(nIn, nIn, kernel_size=(kSize, kSize), stride=1,
                                  padding=(padding, padding), groups=nIn, bias_attr=False),
                        nn.BatchNorm2D(nIn, epsilon=1e-03, momentum=BN_moment))

        self.act_conv1x1 = nn.Sequential(
            nn.PReLU(nIn),
            nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False),
        )

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03, momentum=BN_moment)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        if self.resolution_down:            input = self.down_res(input)
        output = self.conv(input)
        output = self.act_conv1x1(output)        if self.resolution_down:
            output = self.up_res(output)        return self.bn(output)class S2module(nn.Layer):
    '''
    This class defines the ESP block, which is based on the following principle
        Reduce ---> Split ---> Transform --> Merge
    '''

    def __init__(self, nIn, nOut, add=True, config= [[3,1],[5,1]]):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param add: if true, add a residual connection through identity operation. You can use projection too as
                in ResNet paper, but we avoid to use it if the dimensions are not the same because we do not want to
                increase the module complexity
        '''
        super().__init__()

        group_n = len(config)
        n = int(nOut / group_n)
        n1 = nOut - group_n * n

        self.c1 = C(nIn, n, 1, 1, group=group_n)        for i in range(group_n):
            var_name = 'd{}'.format(i + 1)            if i == 0:
                self.__dict__["_sub_layers"][var_name] = S2block(n, n + n1, config[i])            else:
                self.__dict__["_sub_layers"][var_name] = S2block(n, n,  config[i])

        self.BR = BR(nOut)
        self.add = add
        self.group_n = group_n    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        # reduce
        output1 = self.c1(input)
        output1= channel_shuffle(output1, self.group_n)        for i in range(self.group_n):
            var_name = 'd{}'.format(i + 1)
            result_d = self.__dict__["_sub_layers"][var_name](output1)            if i == 0:
                combine = result_d            else:
                combine = paddle.concat([combine, result_d], 1)        # if residual version
        if self.add:
            combine = paddle.add(input, combine)
        output = self.BR(combine)        return outputclass InputProjectionA(nn.Layer):
    '''
    This class projects the input image to the same spatial dimensions as the feature map.
    For example, if the input image is 512 x512 x3 and spatial dimensions of feature map size are 56x56xF, then
    this class will generate an output of 56x56x3
    '''

    def __init__(self, samplingTimes):
        '''
        :param samplingTimes: The rate at which you want to down-sample the image
        '''
        super().__init__()
        self.pool = nn.LayerList()        for i in range(0, samplingTimes):
            self.pool.append(nn.AvgPool2D(2, stride=2))    def forward(self, input):
        '''
        :param input: Input RGB Image
        :return: down-sampled image (pyramid-based approach)
        '''
        for pool in self.pool:            input = pool(input)        return inputclass SINet_Encoder(nn.Layer):

    def __init__(self, config,classes=20, p=5, q=3,  chnn=1.0):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()
        dim1 = 16
        dim2 = 48 + 4 * (chnn - 1)
        dim3 = 96 + 4 * (chnn - 1)

        self.level1 = CBR(3, 12, 3, 2)

        self.level2_0 = SEseparableCBR(12,dim1, 3,2, divide=1)

        self.level2 = nn.LayerList()        for i in range(0, p):            if i ==0:
                self.level2.append(S2module(dim1, dim2, config=config[i], add=False))            else:
                self.level2.append(S2module(dim2, dim2,config=config[i]))
        self.BR2 = BR(dim2+dim1)

        self.level3_0 =SEseparableCBR(dim2+dim1,dim2, 3,2, divide=2)
        self.level3 = nn.LayerList()        for i in range(0, q):            if i==0:
                self.level3.append(S2module(dim2, dim3, config=config[2 + i], add=False))            else:
                self.level3.append(S2module(dim3, dim3,config=config[2+i]))
        self.BR3 = BR(dim3+dim2)

        self.classifier = C(dim3+dim2, classes, 1, 1)    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output1 = self.level1(input) #8h 8w


        output2_0 = self.level2_0(output1)  # 4h 4w

        for i, layer in enumerate(self.level2):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2) # 2h 2w


        output3_0 = self.level3_0(self.BR2(paddle.concat([output2_0, output2],1)))  # h w

        for i, layer in enumerate(self.level3):            if i == 0:
                output3 = layer(output3_0)            else:
                output3 = layer(output3)

        output3_cat = self.BR3(paddle.concat([output3_0, output3], 1))

        classifier = self.classifier(output3_cat)        return classifierclass SINet(nn.Layer):

    def __init__(self,config, classes=20, p=2, q=3, chnn=1.0):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()
        dim2 = 48 + 4 * (chnn - 1)

        self.encoder = SINet_Encoder(config, classes, p, q, chnn)

        self.up = nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2)
        self.bn_3 = nn.BatchNorm2D(classes, epsilon=1e-03)

        self.level2_C = CBR(dim2, classes, 1, 1)

        self.bn_2 = nn.BatchNorm2D(classes, epsilon=1e-03)

        self.classifier = nn.Sequential(
        nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2),
        nn.Conv2D(classes, classes, 3, 1, 1, bias_attr=False))    def forward(self, input):
        '''
        :param input: RGB image
        :return: transformed feature map
        '''
        output1 = self.encoder.level1(input)  # 8h 8w
        output2_0 = self.encoder.level2_0(output1)  # 4h 4w

        for i, layer in enumerate(self.encoder.level2):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)  # 2h 2w

        output3_0 = self.encoder.level3_0(self.encoder.BR2(paddle.concat([output2_0, output2], 1)))  # h w

        for i, layer in enumerate(self.encoder.level3):            if i == 0:
                output3 = layer(output3_0)            else:
                output3 = layer(output3)

        output3_cat = self.encoder.BR3(paddle.concat([output3_0, output3], 1))
        Enc_final = self.encoder.classifier(output3_cat) #1/8

        Dnc_stage1 = self.bn_3(self.up(Enc_final))  # 1/4
        stage1_confidence = paddle.max(nn.functional.softmax(Dnc_stage1, 1), axis=1)

        b, c, h, w = Dnc_stage1.shape
        stage1_gate = (1-stage1_confidence).unsqueeze(1).expand([b, c, h, w])

        Dnc_stage2_0 = self.level2_C(output2)  # 2h 2w
        Dnc_stage2 = self.bn_2(self.up(paddle.add(paddle.multiply(Dnc_stage2_0, stage1_gate), (Dnc_stage1))))  # 4h 4w

        classifier = self.classifier(Dnc_stage2)        return classifier
   
In [8]
# model/extremeC3.py'''
ExtPortraitSeg
Copyright (c) 2019-present NAVER Corp.
MIT license
'''import paddleimport paddle.nn as nn

basic_0 = 24basic_1 = 48basic_2 = 56basic_3 = 24class CBR(nn.Layer):
    '''
    This class defines the convolution layer with batch normalization and PReLU activation
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: stride rate for down-sampling. Default is 1
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)        # self.conv = nn.Conv2D(nIn, nOut, kSize, stride=stride, padding=padding, bias_attr=False)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)        # self.conv1 = nn.Conv2D(nOut, nOut, (1, kSize), stride=1, padding=(0, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)
        self.act = nn.PReLU(nOut)        # self.act = nn.ReLU()


    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        # output = self.conv1(output)
        output = self.bn(output)
        output = self.act(output)        return outputclass BR(nn.Layer):
    '''
        This class groups the batch normalization and PReLU activation
    '''

    def __init__(self, nOut):
        '''
        :param nOut: output feature maps
        '''
        super().__init__()
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)
        self.act = nn.PReLU(nOut)        # self.act = nn.ReLU()

    def forward(self, input):
        '''
        :param input: input feature map
        :return: normalized and thresholded feature map
        '''
        output = self.bn(input)
        output = self.act(output)        return outputclass CB(nn.Layer):
    '''
       This class groups the convolution and batch normalization
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optinal stide for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)
        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-03)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)
        output = self.bn(output)        return outputclass C(nn.Layer):
    '''
    This class is for a convolutional layer.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        '''
        super().__init__()
        padding = int((kSize - 1) / 2)
        self.conv = nn.Conv2D(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias_attr=False)    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass C3block(nn.Layer):
    '''
    This class defines the dilated convolution.
    '''

    def __init__(self, nIn, nOut, kSize, stride=1, d=1):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param kSize: kernel size
        :param stride: optional stride rate for down-sampling
        :param d: optional dilation rate
        '''
        super().__init__()
        padding = int((kSize - 1) / 2) * d        if d == 1:
            self.conv =nn.Sequential(
                nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False,
                          dilation=d),
                nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False)
            )        else:
            combine_kernel = 2 * d - 1

            self.conv = nn.Sequential(
                nn.Conv2D(nIn, nIn, kernel_size=(combine_kernel, 1), stride=stride, padding=(padding - 1, 0),
                          groups=nIn, bias_attr=False),
                nn.BatchNorm2D(nIn),
                nn.PReLU(nIn),
                nn.Conv2D(nIn, nIn, kernel_size=(1, combine_kernel), stride=stride, padding=(0, padding - 1),
                          groups=nIn, bias_attr=False),
                nn.BatchNorm2D(nIn),
                nn.Conv2D(nIn, nIn, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn, bias_attr=False,
                          dilation=d),
                nn.Conv2D(nIn, nOut, kernel_size=1, stride=1, bias_attr=False))    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        output = self.conv(input)        return outputclass Down_advancedC3(nn.Layer):
    def __init__(self, nIn, nOut, ratio=[2,4,8]):
        super().__init__()
        n = int(nOut // 3)
        n1 = nOut - 3 * n
        self.c1 = C(nIn, n, 3, 2)

        self.d1 = C3block(n, n+n1, 3, 1, ratio[0])
        self.d2 = C3block(n, n, 3, 1, ratio[1])
        self.d3 = C3block(n, n, 3, 1, ratio[2])

        self.bn = nn.BatchNorm2D(nOut, epsilon=1e-3)
        self.act = nn.PReLU(nOut)    def forward(self, input):
        output1 = self.c1(input)
        d1 = self.d1(output1)
        d2 = self.d2(output1)
        d3 = self.d3(output1)

        combine = paddle.concat([d1, d2, d3], 1)

        output = self.bn(combine)
        output = self.act(output)        return outputclass AdvancedC3(nn.Layer):
    '''
    This class defines the ESP block, which is based on the following principle
        Reduce ---> Split ---> Transform --> Merge
    '''

    def __init__(self, nIn, nOut, add=True, ratio=[2,4,8]):
        '''
        :param nIn: number of input channels
        :param nOut: number of output channels
        :param add: if true, add a residual connection through identity operation. You can use projection too as
                in ResNet paper, but we avoid to use it if the dimensions are not the same because we do not want to
                increase the module complexity
        '''
        super().__init__()
        n = int(nOut // 3)
        n1 = nOut - 3 * n
        self.c1 = C(nIn, n, 1, 1)

        self.d1 = C3block(n, n + n1, 3, 1, ratio[0])
        self.d2 = C3block(n, n, 3, 1, ratio[1])
        self.d3 = C3block(n, n, 3, 1, ratio[2])        # self.d4 = Double_CDilated(n, n, 3, 1, 12)
        # self.conv =C(nOut, nOut, 1,1)

        self.bn = BR(nOut)
        self.add = add    def forward(self, input):
        '''
        :param input: input feature map
        :return: transformed feature map
        '''
        # reduce
        output1 = self.c1(input)
        d1 = self.d1(output1)
        d2 = self.d2(output1)
        d3 = self.d3(output1)

        combine = paddle.concat([d1, d2, d3], 1)        if self.add:
            combine = paddle.add(input, combine)
        output = self.bn(combine)        return outputclass InputProjectionA(nn.Layer):
    '''
    This class projects the input image to the same spatial dimensions as the feature map.
    For example, if the input image is 512 x512 x3 and spatial dimensions of feature map size are 56x56xF, then
    this class will generate an output of 56x56x3
    '''

    def __init__(self, samplingTimes):
        '''
        :param samplingTimes: The rate at which you want to down-sample the image
        '''
        super().__init__()
        self.pool = nn.LayerList()        for i in range(0, samplingTimes):            # pyramid-based approach for down-sampling
            self.pool.append(nn.AvgPool2D(2, stride=2, padding=0))    def forward(self, input):
        '''
        :param input: Input RGB Image
        :return: down-sampled image (pyramid-based approach)
        '''
        for pool in self.pool:            input = pool(input)        return inputclass ExtremeC3NetCoarse(nn.Layer):
    '''
    This class defines the ESPNet-C network in the paper
    '''

    def __init__(self, classes=20, p=5, q=3):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()


        self.level1 = CBR(3, basic_0, 3, 2)
        self.sample1 = InputProjectionA(1)
        self.sample2 = InputProjectionA(2)

        self.b1 = BR(basic_0 + 3)
        self.level2_0 = Down_advancedC3(basic_0 + 3, basic_1, ratio=[1, 2, 3])  # , ratio=[1,2,3]

        self.level2 = nn.LayerList()        for i in range(0, p):
            self.level2.append(
                AdvancedC3(basic_1, basic_1, ratio=[1, 3, 4]))  # , ratio=[1,3,4]
        self.b2 = BR(basic_1 * 2 + 3)

        self.level3_0 = AdvancedC3(basic_1 * 2 + 3, basic_2, add=False,
                                                            ratio=[1, 3, 5])  # , ratio=[1,3,5]

        self.level3 = nn.LayerList()        for i in range(0, q):
            self.level3.append(AdvancedC3(basic_2, basic_2))
        self.b3 = BR(basic_2 * 2)


        self.Coarseclassifier = C(basic_2*2, classes, 1, 1)    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output0 = self.level1(input)
        inp1 = self.sample1(input)
        inp2 = self.sample2(input)

        output0_cat = self.b1(paddle.concat([output0, inp1], 1))
        output1_0 = self.level2_0(output0_cat)  # down-sampled

        for i, layer in enumerate(self.level2):            if i == 0:
                output1 = layer(output1_0)            else:
                output1 = layer(output1)

        output1_cat = self.b2(paddle.concat([output1, output1_0, inp2], 1))

        output2_0 = self.level3_0(output1_cat)  # down-sampled
        for i, layer in enumerate(self.level3):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)

        output2_cat = self.b3(paddle.concat([output2_0, output2], 1))

        classifier = self.Coarseclassifier(output2_cat)        return classifierclass ExtremeC3Net(nn.Layer):
    '''
    This class defines the ESPNet-C network in the paper
    '''

    def __init__(self, classes=20, p=5, q=3):
        '''
        :param classes: number of classes in the dataset. Default is 20 for the cityscapes
        :param p: depth multiplier
        :param q: depth multiplier
        '''
        super().__init__()


        self.encoder = ExtremeC3NetCoarse(classes, p, q)        # # load the encoder modules
        del self.encoder.Coarseclassifier

        self.upsample = nn.Sequential(
            nn.Conv2D(kernel_size=(1, 1), in_channels=basic_2*2, out_channels=basic_3,bias_attr=False),
            nn.BatchNorm2D(basic_3),
            nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2)

        )

        self.Fine = nn.Sequential(            # nn.Conv2D(kernel_size=3, stride=2, padding=1, in_channels=3, out_channels=basic_3,bias_attr=False),
            C(3, basic_3, 3, 2),
            AdvancedC3(basic_3, basic_3, add=True),            # nn.BatchNorm2D(basic_3, epsilon=1e-03),

        )
        self.classifier = nn.Sequential(
            BR(basic_3),
            nn.Upsample(mode='bilinear', align_corners=True, align_mode=0, scale_factor=2),
            nn.Conv2D(kernel_size=(1, 1), in_channels=basic_3, out_channels=classes, bias_attr=False),
        )    def forward(self, input):
        '''
        :param input: Receives the input RGB image
        :return: the transformed feature map with spatial dimensions 1/8th of the input image
        '''
        output0 = self.encoder.level1(input)
        inp1 = self.encoder.sample1(input)
        inp2 = self.encoder.sample2(input)

        output0_cat = self.encoder.b1(paddle.concat([output0, inp1], 1))
        output1_0 = self.encoder.level2_0(output0_cat)  # down-sampled

        for i, layer in enumerate(self.encoder.level2):            if i == 0:
                output1 = layer(output1_0)            else:
                output1 = layer(output1)

        output1_cat = self.encoder.b2(paddle.concat([output1, output1_0, inp2], 1))

        output2_0 = self.encoder.level3_0(output1_cat)  # down-sampled
        for i, layer in enumerate(self.encoder.level3):            if i == 0:
                output2 = layer(output2_0)            else:
                output2 = layer(output2)

        output2_cat = self.encoder.b3(paddle.concat([output2_0, output2], 1))

        Coarse = self.upsample(output2_cat)
        Fine =  self.Fine(input)
        classifier = self.classifier(paddle.add(Coarse, Fine))        
        return classifier
   

相关专题

更多
页面置换算法
页面置换算法

页面置换算法是操作系统中用来决定在内存中哪些页面应该被换出以便为新的页面提供空间的算法。本专题为大家提供页面置换算法的相关文章,大家可以免费体验。

402

2023.08.14

pytorch是干嘛的
pytorch是干嘛的

pytorch是一个基于python的深度学习框架,提供以下主要功能:动态图计算,提供灵活性。强大的张量操作,实现高效处理。自动微分,简化梯度计算。预构建的神经网络模块,简化模型构建。各种优化器,用于性能优化。想了解更多pytorch的相关内容,可以阅读本专题下面的文章。

431

2024.05.29

Python AI机器学习PyTorch教程_Python怎么用PyTorch和TensorFlow做机器学习
Python AI机器学习PyTorch教程_Python怎么用PyTorch和TensorFlow做机器学习

PyTorch 是一种用于构建深度学习模型的功能完备框架,是一种通常用于图像识别和语言处理等应用程序的机器学习。 使用Python 编写,因此对于大多数机器学习开发者而言,学习和使用起来相对简单。 PyTorch 的独特之处在于,它完全支持GPU,并且使用反向模式自动微分技术,因此可以动态修改计算图形。

21

2025.12.22

高德地图升级方法汇总
高德地图升级方法汇总

本专题整合了高德地图升级相关教程,阅读专题下面的文章了解更多详细内容。

0

2026.01.16

全民K歌得高分教程大全
全民K歌得高分教程大全

本专题整合了全民K歌得高分技巧汇总,阅读专题下面的文章了解更多详细内容。

0

2026.01.16

C++ 单元测试与代码质量保障
C++ 单元测试与代码质量保障

本专题系统讲解 C++ 在单元测试与代码质量保障方面的实战方法,包括测试驱动开发理念、Google Test/Google Mock 的使用、测试用例设计、边界条件验证、持续集成中的自动化测试流程,以及常见代码质量问题的发现与修复。通过工程化示例,帮助开发者建立 可测试、可维护、高质量的 C++ 项目体系。

10

2026.01.16

java数据库连接教程大全
java数据库连接教程大全

本专题整合了java数据库连接相关教程,阅读专题下面的文章了解更多详细内容。

32

2026.01.15

Java音频处理教程汇总
Java音频处理教程汇总

本专题整合了java音频处理教程大全,阅读专题下面的文章了解更多详细内容。

14

2026.01.15

windows查看wifi密码教程大全
windows查看wifi密码教程大全

本专题整合了windows查看wifi密码教程大全,阅读专题下面的文章了解更多详细内容。

42

2026.01.15

热门下载

更多
网站特效
/
网站源码
/
网站素材
/
前端模板

精品课程

更多
相关推荐
/
热门推荐
/
最新课程
最新Python教程 从入门到精通
最新Python教程 从入门到精通

共4课时 | 2.1万人学习

Django 教程
Django 教程

共28课时 | 3.1万人学习

SciPy 教程
SciPy 教程

共10课时 | 1.1万人学习

关于我们 免责申明 举报中心 意见反馈 讲师合作 广告合作 最新更新
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送

Copyright 2014-2026 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号