| 1 | $Id$ |
|---|
| 2 | |
|---|
| 3 | This is the documentation for libsyck and describes how to extend it. |
|---|
| 4 | |
|---|
| 5 | = Overview = |
|---|
| 6 | |
|---|
| 7 | Syck is designed to take a YAML stream and a symbol table and move |
|---|
| 8 | data between the two. Your job is to simply provide callback functions which |
|---|
| 9 | understand the symbol table you are keeping. |
|---|
| 10 | |
|---|
| 11 | Syck also includes a simple symbol table implementation. |
|---|
| 12 | |
|---|
| 13 | == About the Source == |
|---|
| 14 | |
|---|
| 15 | The Syck distribution is laid out as follows: |
|---|
| 16 | |
|---|
| 17 | lib/ libsyck source (core API) |
|---|
| 18 | bytecode.re lexer for YAML bytecode (re2c) |
|---|
| 19 | emitter.c emitter functions |
|---|
| 20 | gram.y grammar for YAML documents (bison) |
|---|
| 21 | handler.c internal handlers which glue the lexer and grammar |
|---|
| 22 | implicit.re lexer for builtin YAML types (re2c) |
|---|
| 23 | node.c node allocation and access |
|---|
| 24 | syck.c parser funcs, central funcs |
|---|
| 25 | syck.h libsyck definitions |
|---|
| 26 | syck_st.c symbol table functions |
|---|
| 27 | syck_st.h symbol table definitions |
|---|
| 28 | token.re lexer for YAML plaintext (re2c) |
|---|
| 29 | yaml2byte.c simple bytecode emitter |
|---|
| 30 | ext/ ruby, python, php, cocoa extensions |
|---|
| 31 | tests/ unit tests for libsyck |
|---|
| 32 | YTS.c.rb generates YAML Testing Suite unit test |
|---|
| 33 | (use: ruby YTS.c.rb > YTS.c) |
|---|
| 34 | Basic.c allocation and buffering tests |
|---|
| 35 | Parse.c parser sanity |
|---|
| 36 | Emit.c emitter sanity |
|---|
| 37 | |
|---|
| 38 | == Using SyckNodes == |
|---|
| 39 | |
|---|
| 40 | The SyckNode is the structure which YAML data is loaded into |
|---|
| 41 | while parsing. It's also a good structure to use while emitting, |
|---|
| 42 | however you may choose to emit directly from your native types |
|---|
| 43 | if your extension is very small. |
|---|
| 44 | |
|---|
| 45 | SyckNodes are designed to be used in conjunction with a symbol |
|---|
| 46 | table. More on that in a moment. For now, think of a symbol |
|---|
| 47 | table as a library which stores nodes, assigning each node a |
|---|
| 48 | unique identifier. |
|---|
| 49 | |
|---|
| 50 | This identifier is called the SYMID in Syck. Nodes refer to |
|---|
| 51 | each other by SYMIDs, rather than pointers. This way, the |
|---|
| 52 | nodes can be free'd as the parser goes. |
|---|
| 53 | |
|---|
| 54 | To be honest, SYMIDs are used because this is the way Ruby |
|---|
| 55 | works. And this technique means Syck can use Ruby's symbol |
|---|
| 56 | table directly. But the included symbol table is lightweight, |
|---|
| 57 | solves the problem of keeping too much data in memory, and |
|---|
| 58 | simply pairs SYMIDs with your native object type (such as |
|---|
| 59 | PyObject pointers.) |
|---|
| 60 | |
|---|
| 61 | Three kinds of SyckNodes are available: |
|---|
| 62 | |
|---|
| 63 | 1. scalar nodes (syck_str_kind): |
|---|
| 64 | These nodes store a string, a length for the string |
|---|
| 65 | and a style (indicating the format used in the YAML |
|---|
| 66 | document). |
|---|
| 67 | |
|---|
| 68 | 2. sequence nodes (syck_seq_kind): |
|---|
| 69 | Sequences are YAML's array or list type. |
|---|
| 70 | These nodes store a list of items, which allocation |
|---|
| 71 | is handled by syck functions. |
|---|
| 72 | |
|---|
| 73 | 3. mapping nodes (syck_map_kind): |
|---|
| 74 | Mappings are YAML's dictionary or hashtable type. |
|---|
| 75 | These nodes store a list of pairs, which allocation |
|---|
| 76 | is handled by syck functions. |
|---|
| 77 | |
|---|
| 78 | The syck_kind_tag enum specifies the above enumerations, |
|---|
| 79 | which can be tested against the SyckNode.kind field. |
|---|
| 80 | |
|---|
| 81 | PLEASE leave the SyckNode.shortcut field alone!! It's |
|---|
| 82 | used by the parser to workaround parser ambiguities!! |
|---|
| 83 | |
|---|
| 84 | === Node API === |
|---|
| 85 | |
|---|
| 86 | SyckNode * |
|---|
| 87 | syck_alloc_str() |
|---|
| 88 | syck_alloc_seq() |
|---|
| 89 | syck_alloc_str() |
|---|
| 90 | |
|---|
| 91 | Allocates a node of a given type and initializes its |
|---|
| 92 | internal union to emptiness. When left as-is, these |
|---|
| 93 | nodes operate as a valid empty string, empty sequence |
|---|
| 94 | and empty map. |
|---|
| 95 | |
|---|
| 96 | Remember that the node's id (SYMID) isn't set by the |
|---|
| 97 | allocation functions OR any other node functions herein. |
|---|
| 98 | It's up to your handler function to do that. |
|---|
| 99 | |
|---|
| 100 | void |
|---|
| 101 | syck_free_node( SyckNode *n ) |
|---|
| 102 | |
|---|
| 103 | While the Syck parser will free nodes it creates, use |
|---|
| 104 | this to free your own nodes. This function will free |
|---|
| 105 | all of its internals, its type_id and its anchor. If |
|---|
| 106 | you don't need those members free, please be sure they |
|---|
| 107 | are set to NULL. |
|---|
| 108 | |
|---|
| 109 | SyckNode * |
|---|
| 110 | syck_new_str( char *str, enum scalar_style style ) |
|---|
| 111 | syck_new_str2( char *str, long len, enum scalar_style style ) |
|---|
| 112 | |
|---|
| 113 | Creates scalar nodes from C strings. The first function |
|---|
| 114 | will call strlen() to determine length. |
|---|
| 115 | |
|---|
| 116 | void |
|---|
| 117 | syck_replace_str( SyckNode *n, char *str, enum scalar_style style ) |
|---|
| 118 | syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style ) |
|---|
| 119 | |
|---|
| 120 | Replaces the string content of a node `n', while keeping |
|---|
| 121 | the node's type_id, anchor and id. |
|---|
| 122 | |
|---|
| 123 | char * |
|---|
| 124 | syck_str_read( SyckNode *n ) |
|---|
| 125 | |
|---|
| 126 | Returns a pointer to the null-terminated string inside scalar node |
|---|
| 127 | `n'. Normally, you might just want to use: |
|---|
| 128 | |
|---|
| 129 | char *ptr = n->data.str->ptr |
|---|
| 130 | long len = n->data.str->len |
|---|
| 131 | |
|---|
| 132 | SyckNode * |
|---|
| 133 | syck_new_map( SYMID key, SYMID value ) |
|---|
| 134 | |
|---|
| 135 | Allocates a new map with an initial pair of nodes. |
|---|
| 136 | |
|---|
| 137 | void |
|---|
| 138 | syck_map_empty( SyckNode *n ) |
|---|
| 139 | |
|---|
| 140 | Empties the set of pairs for a mapping node. |
|---|
| 141 | |
|---|
| 142 | void |
|---|
| 143 | syck_map_add( SyckNode *n, SYMID key, SYMID value ) |
|---|
| 144 | |
|---|
| 145 | Pushes a key-value pair on the mapping. While the ordering |
|---|
| 146 | of pairs DOES affect the ordering of pairs on output, loaded |
|---|
| 147 | nodes are deliberately out of order (since YAML mappings do |
|---|
| 148 | not preserve ordering.) |
|---|
| 149 | |
|---|
| 150 | See YAML's builtin !omap type for ordering in mapping nodes. |
|---|
| 151 | |
|---|
| 152 | SYMID |
|---|
| 153 | syck_map_read( SyckNode *n, enum map_part, long index ) |
|---|
| 154 | |
|---|
| 155 | Loads a specific key or value from position `index' within |
|---|
| 156 | a mapping node. Great for iteration: |
|---|
| 157 | |
|---|
| 158 | for ( i = 0; i < syck_map_count( n ); i++ ) { |
|---|
| 159 | SYMID key = sym_map_read( n, map_key, i ); |
|---|
| 160 | SYMID val = sym_map_read( n, map_value, i ); |
|---|
| 161 | } |
|---|
| 162 | |
|---|
| 163 | void |
|---|
| 164 | syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id ) |
|---|
| 165 | |
|---|
| 166 | Replaces a specific key or value at position `index' within |
|---|
| 167 | a mapping node. Useful for replacement only, will not allocate |
|---|
| 168 | more room when assigned beyond the end of the pair list. |
|---|
| 169 | |
|---|
| 170 | long |
|---|
| 171 | syck_map_count( SyckNode *n ) |
|---|
| 172 | |
|---|
| 173 | Returns a count of the pairs contained by the mapping node. |
|---|
| 174 | |
|---|
| 175 | void |
|---|
| 176 | syck_map_update( SyckNode *n, SyckNode *n2 ) |
|---|
| 177 | |
|---|
| 178 | Combines all pairs from mapping node `n2' into mapping node |
|---|
| 179 | `n'. |
|---|
| 180 | |
|---|
| 181 | SyckNode * |
|---|
| 182 | syck_new_seq( SYMID val ) |
|---|
| 183 | |
|---|
| 184 | Allocates a new seq with an entry `val'. |
|---|
| 185 | |
|---|
| 186 | void |
|---|
| 187 | syck_seq_empty( SyckNode *n ) |
|---|
| 188 | |
|---|
| 189 | Empties a sequence node `n'. |
|---|
| 190 | |
|---|
| 191 | void |
|---|
| 192 | syck_seq_add( SyckNode *n, SYMID val ) |
|---|
| 193 | |
|---|
| 194 | Pushes a new item `val' onto the end of the sequence. |
|---|
| 195 | |
|---|
| 196 | void |
|---|
| 197 | syck_seq_assign( SyckNode *n, long index, SYMID val ) |
|---|
| 198 | |
|---|
| 199 | Replaces the item at position `index' in the sequence |
|---|
| 200 | node with item `val'. Useful for replacement only, will not allocate |
|---|
| 201 | more room when assigned beyond the end of the pair list. |
|---|
| 202 | |
|---|
| 203 | SYMID |
|---|
| 204 | syck_seq_read( SyckNode *n, long index ) |
|---|
| 205 | |
|---|
| 206 | Reads the item at position `index' in the sequence node. |
|---|
| 207 | Again, for iteration: |
|---|
| 208 | |
|---|
| 209 | for ( i = 0; i < syck_seq_count( n ); i++ ) { |
|---|
| 210 | SYMID val = sym_seq_read( n, i ); |
|---|
| 211 | } |
|---|
| 212 | |
|---|
| 213 | long |
|---|
| 214 | syck_seq_count( SyckNode *n ) |
|---|
| 215 | |
|---|
| 216 | Returns a count of items contained by sequence node `n'. |
|---|
| 217 | |
|---|
| 218 | == YAML Parser == |
|---|
| 219 | |
|---|
| 220 | Syck's YAML parser is extremely simple. After setting up a |
|---|
| 221 | SyckParser struct, along with callback functions for loading |
|---|
| 222 | node data, use syck_parse() to start reading data. Since |
|---|
| 223 | syck_parse() only reads single documents, the stream can be |
|---|
| 224 | managed by calling syck_parse() repeatedly for an IO source. |
|---|
| 225 | |
|---|
| 226 | The parser has four callbacks: one for reading from the IO |
|---|
| 227 | source, one for handling errors that show up, one for |
|---|
| 228 | handling nodes as they come in, one for handling bad |
|---|
| 229 | anchors in the document. Nodes are loaded in the order they |
|---|
| 230 | appear in the YAML document, however nested nodes are loaded |
|---|
| 231 | before their parent. |
|---|
| 232 | |
|---|
| 233 | === How to Write a Node Handler === |
|---|
| 234 | |
|---|
| 235 | Inside the node handler, the normal process should be: |
|---|
| 236 | |
|---|
| 237 | 1. Convert the SyckNode data to a structure meaningful |
|---|
| 238 | to your application. |
|---|
| 239 | |
|---|
| 240 | 2. Check for the bad anchor caveat described in the |
|---|
| 241 | next section. |
|---|
| 242 | |
|---|
| 243 | 3. Add the new structure to the symbol table attached |
|---|
| 244 | to the parser. Found at parser->syms. |
|---|
| 245 | |
|---|
| 246 | 4. Return the SYMID reserved in the symbol table. |
|---|
| 247 | |
|---|
| 248 | === Nodes and Memory Allocation === |
|---|
| 249 | |
|---|
| 250 | One thing about SyckNodes passed into your handler: |
|---|
| 251 | Syck WILL free the node once your handler is done with it. |
|---|
| 252 | The node is temporary. So, if you plan on keeping a node |
|---|
| 253 | around, you'll need to make yourself a new copy. |
|---|
| 254 | |
|---|
| 255 | And you'll probably need to reassign all the items |
|---|
| 256 | in a sequence and pairs in a map. You can do this |
|---|
| 257 | with syck_seq_assign() and syck_map_assign(). But, before |
|---|
| 258 | you do that, you might consider using your own node structure |
|---|
| 259 | that fits your application better. |
|---|
| 260 | |
|---|
| 261 | === A Note About Anchors in Parsing === |
|---|
| 262 | |
|---|
| 263 | YAML anchors can be recursive. This means deeper alias nodes |
|---|
| 264 | can be loaded before the anchor. This is the trickiest part |
|---|
| 265 | of the loading process. |
|---|
| 266 | |
|---|
| 267 | Assuming this YAML document: |
|---|
| 268 | |
|---|
| 269 | --- &a [*a] |
|---|
| 270 | |
|---|
| 271 | The loading process is: |
|---|
| 272 | |
|---|
| 273 | 1. Load alias *a by calling parser->bad_anchor_handler, which |
|---|
| 274 | reserves a SYMID in the symbol table. |
|---|
| 275 | |
|---|
| 276 | 2. The `a' anchor is added to Syck's own anchor table, |
|---|
| 277 | referencing the SYMID above. |
|---|
| 278 | |
|---|
| 279 | 3. When the anchor &a is found, the SyckNode created is |
|---|
| 280 | given the SYMID of the bad anchor node above. (Usually |
|---|
| 281 | nodes created at this stage have the `id' blank.) |
|---|
| 282 | |
|---|
| 283 | 4. The parser->handler function is called with that node. |
|---|
| 284 | Check for node->id in the handler and overwrite the |
|---|
| 285 | bad anchor node with the new node. |
|---|
| 286 | |
|---|
| 287 | === Parser API === |
|---|
| 288 | |
|---|
| 289 | See <syck.h> for layouts of SyckParser and SyckNode. |
|---|
| 290 | |
|---|
| 291 | SyckParser * |
|---|
| 292 | syck_new_parser() |
|---|
| 293 | |
|---|
| 294 | Creates a new Syck parser. |
|---|
| 295 | |
|---|
| 296 | void |
|---|
| 297 | syck_free_parser( SyckParser *p ) |
|---|
| 298 | |
|---|
| 299 | Frees the parser, as well as associated symbol tables |
|---|
| 300 | and buffers. |
|---|
| 301 | |
|---|
| 302 | void |
|---|
| 303 | syck_parser_implicit_typing( SyckParser *p, int on ) |
|---|
| 304 | |
|---|
| 305 | Toggles implicit typing of builtin YAML types. If |
|---|
| 306 | this is passed a zero, YAML builtin types will be |
|---|
| 307 | ignored (!int, !float, etc.) The default is 1. |
|---|
| 308 | |
|---|
| 309 | void |
|---|
| 310 | syck_parser_taguri_expansion( SyckParser *p, int on ) |
|---|
| 311 | |
|---|
| 312 | Toggles expansion of types in full taguri. This |
|---|
| 313 | defaults to 1 and is recommended to stay as 1. |
|---|
| 314 | Turning this off removes a layer of abstraction |
|---|
| 315 | that will cause incompatibilities between YAML |
|---|
| 316 | documents of differing versions. |
|---|
| 317 | |
|---|
| 318 | void |
|---|
| 319 | syck_parser_handler( SyckParser *p, SyckNodeHandler h ) |
|---|
| 320 | |
|---|
| 321 | Assign a callback function as a node handler. The |
|---|
| 322 | SyckNodeHandler signature looks like this: |
|---|
| 323 | |
|---|
| 324 | SYMID node_handler( SyckParser *p, SyckNode *n ) |
|---|
| 325 | |
|---|
| 326 | void |
|---|
| 327 | syck_parser_error_handler( SyckParser *p, SyckErrorHandler h ) |
|---|
| 328 | |
|---|
| 329 | Assign a callback function as an error handler. The |
|---|
| 330 | SyckErrorHandler signature looks like this: |
|---|
| 331 | |
|---|
| 332 | void error_handler( SyckParser *p, char *str ) |
|---|
| 333 | |
|---|
| 334 | void |
|---|
| 335 | syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h ) |
|---|
| 336 | |
|---|
| 337 | Assign a callback function as a bad anchor handler. |
|---|
| 338 | The SyckBadAnchorHandler signature looks like this: |
|---|
| 339 | |
|---|
| 340 | SyckNode *bad_anchor_handler( SyckParser *p, char *anchor ) |
|---|
| 341 | |
|---|
| 342 | void |
|---|
| 343 | syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r ) |
|---|
| 344 | |
|---|
| 345 | Assigns a FILE pointer as an IO source and a callback function |
|---|
| 346 | which handles buffering of that IO source. |
|---|
| 347 | |
|---|
| 348 | The SyckIoFileRead signature looks like this: |
|---|
| 349 | |
|---|
| 350 | long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip ); |
|---|
| 351 | |
|---|
| 352 | Syck comes with a default FILE handler named `syck_io_file_read'. You |
|---|
| 353 | can assign this default handler explicitly or by simply passing in NULL |
|---|
| 354 | as the `r' parameter. |
|---|
| 355 | |
|---|
| 356 | void |
|---|
| 357 | syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r ) |
|---|
| 358 | |
|---|
| 359 | Assigns a string as the IO source with a callback function `r' |
|---|
| 360 | which handles buffering of the string. |
|---|
| 361 | |
|---|
| 362 | The SyckIoStrRead signature looks like this: |
|---|
| 363 | |
|---|
| 364 | long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip ); |
|---|
| 365 | |
|---|
| 366 | Syck comes with a default string handler named `syck_io_str_read'. You |
|---|
| 367 | can assign this default handler explicitly or by simply passing in NULL |
|---|
| 368 | as the `r' parameter. |
|---|
| 369 | |
|---|
| 370 | void |
|---|
| 371 | syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r ) |
|---|
| 372 | |
|---|
| 373 | Same as the above, but uses strlen() to determine string size. |
|---|
| 374 | |
|---|
| 375 | |
|---|
| 376 | SYMID |
|---|
| 377 | syck_parse( SyckParser *p ) |
|---|
| 378 | |
|---|
| 379 | Parses a single document from the YAML stream, returning the SYMID for |
|---|
| 380 | the root node. |
|---|
| 381 | |
|---|
| 382 | == YAML Emitter == |
|---|
| 383 | |
|---|
| 384 | Since the YAML 0.50 release, Syck has featured a new emitter API. The idea |
|---|
| 385 | here is to let Syck figure out shortcuts that will clean up output, detect |
|---|
| 386 | builtin YAML types and -- especially -- determine the best way to format |
|---|
| 387 | outgoing strings. |
|---|
| 388 | |
|---|
| 389 | The trick with the emitter is to learn its functions and let it do its |
|---|
| 390 | job. If you don't like the formatting Syck is producing, please get in |
|---|
| 391 | contact the author and pitch your ideas!! |
|---|
| 392 | |
|---|
| 393 | Like the YAML parser, the emitter has a couple of callbacks: namely, |
|---|
| 394 | one for IO output and one for handling nodes. Nodes aren't necessarily |
|---|
| 395 | SyckNodes. Since we're ultimately worried about creating a string, SyckNodes |
|---|
| 396 | become sort of unnecessary. |
|---|
| 397 | |
|---|
| 398 | === The Emitter Process === |
|---|
| 399 | |
|---|
| 400 | 1. Traverse the structure you will be emitting, registering all nodes |
|---|
| 401 | with the emitter using syck_emitter_mark_node(). This step will |
|---|
| 402 | determine anchors and aliases in advance. |
|---|
| 403 | |
|---|
| 404 | 2. Call syck_emit() to begin emitting the root node. |
|---|
| 405 | |
|---|
| 406 | 3. Within your emitter handler, use the syck_emit_* convenience methods |
|---|
| 407 | to build the document. |
|---|
| 408 | |
|---|
| 409 | 4. Call syck_emit_flush() to end the document and push the remaining |
|---|
| 410 | document to the IO stream. Or continue to add documents to the output |
|---|
| 411 | stream with syck_emit(). |
|---|
| 412 | |
|---|
| 413 | === Emitter API === |
|---|
| 414 | |
|---|
| 415 | See <syck.h> for the layout of SyckEmitter. |
|---|
| 416 | |
|---|
| 417 | SyckEmitter * |
|---|
| 418 | syck_new_emitter() |
|---|
| 419 | |
|---|
| 420 | Creates a new Syck emitter. |
|---|
| 421 | |
|---|
| 422 | SYMID |
|---|
| 423 | syck_emitter_mark_node( SyckEmitter *e, st_data_t node ) |
|---|
| 424 | |
|---|
| 425 | Adds an outgoing node to the symbol table, allocating an anchor |
|---|
| 426 | for it if it has repeated in the document and scanning the type |
|---|
| 427 | tag for auto-shortcut. |
|---|
| 428 | |
|---|
| 429 | void |
|---|
| 430 | syck_output_handler( SyckEmitter *e, SyckOutputHandler out ) |
|---|
| 431 | |
|---|
| 432 | Assigns a callback as the output handler. |
|---|
| 433 | |
|---|
| 434 | void *out_handler( SyckEmitter *e, char * ptr, long len ); |
|---|
| 435 | |
|---|
| 436 | Receives the emitter object, pointer to the buffer and a count |
|---|
| 437 | of bytes which should be read from the buffer. |
|---|
| 438 | |
|---|
| 439 | void |
|---|
| 440 | syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler |
|---|
| 441 | |
|---|
| 442 | |
|---|
| 443 | void |
|---|
| 444 | syck_free_emitter |
|---|