• Clément Bœsch's avatar
    subtitles: introduce ASS codec id and use it. · 7c1a002c
    Clément Bœsch authored
    Currently, we have a AV_CODEC_ID_SSA, which matches the way the ASS/SSA
    markup is muxed in a standalone .ass/.ssa file. This means the AVPacket
    data starts with a "Dialogue:" string, followed by a timing information
    (start and end of the event as string) and a trailing CRLF after each
    line. One packet can contain several lines. We'll refer to this layout
    as "SSA" or "SSA lines".
    
    In matroska, this markup is not stored as such: it has no "Dialogue:"
    prefix, it contains a ReadOrder field, the timing information is not in
    the payload, and it doesn't contain the trailing CRLF. See [1] for more
    info. We'll refer to this layout as "ASS".
    
    Since we have only one common codec for both formats, the matroska
    demuxer is constructing an AVPacket following the "SSA lines" format.
    This causes several problems, so it was decided to change this into
    clean ASS packets.
    
    Some insight about what is changed or unchanged in this commit:
    
      CODECS
      ------
    
      - the decoding process still writes "SSA lines" markup inside the ass
        fields of the subtitles rectangles (sub->rects[n]->ass), which is
        still the current common way of representing decoded subtitles
        markup. It is meant to change later.
    
      - new ASS codec id: AV_CODEC_ID_ASS (which is different from the
        legacy AV_CODEC_ID_SSA)
    
      - lavc/assdec: the "ass" decoder is renamed into "ssa" (instead of
        "ass") for consistency with the codec id and allows to add a real
        ass decoder. This ass decoder receives clean ASS lines (so it starts
        with a ReadOrder, is followed by the Layer, etc). We make sure this
        is decoded properly in a new ass-line rectangle of the decoded
        subtitles (the ssa decoder OTOH is doing a simple straightforward
        copy). Using the packet timing instead of data string makes sure the
        ass-line now contains the appropriate timing.
    
      - lavc/assenc: just like the ass decoder, the "ssa" encoder is renamed
        into "ssa" (instead of "ass") for consistency with the codec id, and
        allows to add a real "ass" encoder.
    
        One important thing about this encoder is that it only supports one
        ass rectangle: we could have put several dialogue events in the
        AVPacket (separated by a \0 for instance) but this would have cause
        trouble for the muxer which needs not only the start time, but also
        the duration: typically, you have merged events with the same start
        time (stored in the AVPacket->pts) but a different duration. At the
        moment, only the matroska do the merge with the SSA-line codec.
    
        We will need to make sure all the decoders in the future can't add
        more than one rectangle (and only one Dialogue line in it
        obviously).
    
      FORMATS
      -------
    
      - lavf/assenc: the .ass/.ssa muxer can take both SSA and ASS packets.
        In the case of ASS packets as input, it adds the timing based on the
        AVPacket pts and duration, and mux it with "Dialogue:", trailing
        CRLF, etc.
    
      - lavf/assdec: unchanged; it currently still only outputs SSA-lines
        packets.
    
      - lavf/mkv: the demuxer can now output ASS packets without the need of
        any "SSA-lines" reconstruction hack. It will become the default at
        next libavformat bump, and the SSA support will be dropped from the
        demuxer. The muxer can take ASS packets since it's muxed normally,
        and still supports the old SSA packets. All the SSA support and
        hacks in Matroska code will be dropped at next lavf bump.
    
    [1]: http://www.matroska.org/technical/specs/subtitles/ssa.html
    7c1a002c