• j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    It likely doesn’t know what “womens” means, but it is funny. The minor errors indicate a comprehension issue.

    • Pingudiem@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      English is not my first language but wouldn’t that be the correct for for ownership of things in regard to the plural of woman? Like a man’s world. women’s attire? In my native language that would make sense.

      • j4k3@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Every character is important in AI, including the spaces between words and punctuation. “Womens” is not a word in English. Women is already the plural form of woman. There must be 's to denote the possessive ownership.

        In generative AI, the tools to monitor the tokenized model input are more challenging to view as these tools are not integrated into Automatic1111 or ComfyUI by default like how the feature is integrated into Oobabooga Textgen for LLM’s. Monitoring the tokenized input for the model would show how the word was either omitted entirely or was broken into the simplified single letters, or at least that is how LLM’s do tokenization.

        You should always keep in mind that every word and style you use in a prompt, must correlate with tags that were trained with the image. Many models are trained with natural language sentences, so they have some degree of natural language processing. It is not complex in the same natural language processing as a text to text model where there are complex special tokens that connect the input to the output.

        The way tokens are processed is a major aspect of the evolution of generative AI. For instance, the first stable diffusion 1.x models use CLIP G, which is a very small language processing model. The SDXL models use a dual processing setup with CLIP G and CLIP L used in tandem. The last Stable Diffusion model, SD3, uses a triple processing setup that uses G, L, along with a full T5xxl text to text large language model. I haven’t gone super in depth trying to understand the codebase from SD3, but there is something weird happening with the T5 where SD3 is swapping an entire tensor layer each time the model loads instead of shipping a pretrained model or using a LoRA layering scheme. Safety with generative AI is different from LLM’s. It is not part of the model in the same way that safety works for a LLM. I found it fascinating how SD3 omits human genitalia and started looking into the code for ComfyUI as a result because this behavior is deterministic and therefore not part of the actual tensor tables maths. The behavior centers around the T5 model… Anyways, I’m getting stupid technical on a tangent… What I meant to say is that the text processing and tokenization of the model is external to the tensor tables of the actual generative model. If the processing scheme is complex enough, it might be possible to error correct the prompt, but it is best to assume that the prompt will be exactly as it was submitted.

    • Land_Strider@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Yeah, it goes to show a little worn tunic with a pretty much modern, brightly colored and microfiber machine woven, elastic hand, arm and head accessories. Also, mixes high society attire with regular folk attire.

      Overall a nice picture to look at, but nowhere near accurate.