Shortcuts

TilePositionalEmbedding

class torchtune.models.clip.TilePositionalEmbedding(max_num_tiles: int, embed_dim: int)[source]

Positional embedding for tiles, different for every tile, same for every token within a tile.

Notice that tile is different from patch (token). For details, please check the documentation of torchtune.modules.vision_transformer.VisionTransformer.

Parameters:
  • max_num_tiles (int) – The maximum number of tiles an image can be divided into.

  • embed_dim (int) – The dimensionality of each tile embedding.

forward(x: Tensor, aspect_ratio: Tensor) Tensor[source]
Parameters:
  • x (torch.Tensor) – torch.Tensor with shape (bsz * n_imgs, n_tiles, n_tokens_per_tile, embed_dim).

  • aspect_ratio (torch.Tensor) – torch.Tensor with shape (bsz * n_imgs, 2), representing the aspect ratio of the image before tile-cropping, e.g. (2,1).

Returns:

The input tensor with added positional embeddings.

Return type:

torch.Tensor

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources