Skip to content

The channel dimension of the grayscale image with channel 1 is lost after being converted to a dataloader. #1417

Description

@AKALeon

My device is Apple M2 Ultra, and the torch version is 0.16.3.

When I tested the mnist_dataset, I found the following error:

library(torch)
library(torchvision)
library(luz)

dir <- "torch-datasets"

train_ds <- mnist_dataset(root = dir,
                          download = TRUE,
                          train = TRUE,
                          transform = transform_to_tensor)

valid_ds <- mnist_dataset(root = dir,
                          download = TRUE,
                          train = FALSE,
                          transform = transform_to_tensor)

train_dl <- dataloader(dataset = train_ds, batch_size = 128, shuffle = TRUE)
valid_dl <- dataloader(dataset = valid_ds, batch_size = 128)

convnet <- nn_module(
  classname = "convnet",
  initialize = function() {
    # nn_conv2d(in_channels, out_channels, kernel_size, stride)
    self$conv1 <- nn_conv2d(in_channels = 1, out_channels = 32, kernel_size = 3, stride = 1)
    self$conv2 <- nn_conv2d(in_channels = 32, out_channels = 64, kernel_size = 3, stride = 2)
    self$conv3 <- nn_conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1)
    self$conv4 <- nn_conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 2)
    self$conv5 <- nn_conv2d(in_channels = 256, out_channels = 10, kernel_size = 3, stride = 2)
  },
  forward = function(x) {
    x %>%
      self$conv1() %>%
      nnf_relu() %>%
      self$conv2() %>%
      nnf_relu() %>%
      self$conv3() %>%
      nnf_relu() %>%
      self$conv4() %>%
      nnf_relu() %>%
      self$conv5() %>%
      torch_squeeze()
  }
)

fitted <- convnet %>%
  setup(
    loss = nn_cross_entropy_loss(),
    optimizer = optim_adam,
    metrics = list(
      luz_metric_accuracy()
    )
  ) %>%
  fit(data = train_dl, epochs = 10, valid_data = valid_dl)
Error in (function (input, weight, bias, stride, padding, dilation, groups) :
Given groups=1, weight of size [32, 1, 3, 3], expected input[1, 128, 28, 28] to have 1 channels, but got 128 channels instead

Based on the error message, I determined that it was caused by the channel of mnist being 1, because everything worked fine when I tried RGB 3-channel images.

For an mnist image, the code can run:

model = convnet()
model(train_ds$.getitem(1)$x)
torch_tensor
0.01 *
 1.0561
 2.1136
-2.7016
-0.4070
-0.2968
 0.5746
-2.0773
-1.7271
 1.8103
-0.1465
[ CPUFloatType{10} ][ grad_fn = <SqueezeBackward0> ]

Finally, I checked the batch and found that the error lies in the batch. Obviously, the shape should be 128, 1, 28, 28. How should I solve this?

batch <- train_dl |> dataloader_make_iter() |> dataloader_next()
batch$x$shape
[1] 128  28  28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions