root / trunk / README.EXT

Revision 213, 13.5 kB (checked in by why, 4 years ago)

ext/ruby/ext/syck/rubyext.c:
- const_find, now locating class constants correctly.
- YAML::Object class for loaded objects which have no corresponding class.
- No anchors on simple strings.
- Outputing of domain and private types and anchors properly.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1$Id$
2
3This is the documentation for libsyck and describes how to extend it.
4
5= Overview =
6
7Syck is designed to take a YAML stream and a symbol table and move
8data between the two.  Your job is to simply provide callback functions which
9understand the symbol table you are keeping.
10
11Syck also includes a simple symbol table implementation.
12
13== About the Source ==
14
15The Syck distribution is laid out as follows:
16
17  lib/             libsyck source (core API)
18    bytecode.re    lexer for YAML bytecode (re2c)
19    emitter.c      emitter functions
20    gram.y         grammar for YAML documents (bison)
21    handler.c      internal handlers which glue the lexer and grammar
22    implicit.re    lexer for builtin YAML types (re2c)
23    node.c         node allocation and access
24    syck.c         parser funcs, central funcs
25    syck.h         libsyck definitions
26    syck_st.c      symbol table functions
27    syck_st.h      symbol table definitions
28    token.re       lexer for YAML plaintext (re2c)
29    yaml2byte.c    simple bytecode emitter
30  ext/             ruby, python, php, cocoa extensions
31  tests/           unit tests for libsyck
32    YTS.c.rb       generates YAML Testing Suite unit test
33                   (use: ruby YTS.c.rb > YTS.c)
34    Basic.c        allocation and buffering tests
35    Parse.c        parser sanity
36    Emit.c         emitter sanity
37
38== Using SyckNodes ==
39
40The SyckNode is the structure which YAML data is loaded into
41while parsing.  It's also a good structure to use while emitting,
42however you may choose to emit directly from your native types
43if your extension is very small.
44
45SyckNodes are designed to be used in conjunction with a symbol
46table.  More on that in a moment.  For now, think of a symbol
47table as a library which stores nodes, assigning each node a
48unique identifier.
49
50This identifier is called the SYMID in Syck.  Nodes refer to
51each other by SYMIDs, rather than pointers.  This way, the
52nodes can be free'd as the parser goes.
53
54To be honest, SYMIDs are used because this is the way Ruby
55works.  And this technique means Syck can use Ruby's symbol
56table directly.  But the included symbol table is lightweight,
57solves the problem of keeping too much data in memory, and
58simply pairs SYMIDs with your native object type (such as
59PyObject pointers.)
60
61Three kinds of SyckNodes are available:
62
631. scalar nodes (syck_str_kind):
64   These nodes store a string, a length for the string
65   and a style (indicating the format used in the YAML
66   document).
67
682. sequence nodes (syck_seq_kind):
69   Sequences are YAML's array or list type.
70   These nodes store a list of items, which allocation
71   is handled by syck functions.
72
733. mapping nodes (syck_map_kind):
74   Mappings are YAML's dictionary or hashtable type.
75   These nodes store a list of pairs, which allocation
76   is handled by syck functions.
77
78The syck_kind_tag enum specifies the above enumerations,
79which can be tested against the SyckNode.kind field.
80
81PLEASE leave the SyckNode.shortcut field alone!!  It's
82used by the parser to workaround parser ambiguities!!
83
84=== Node API ===
85
86  SyckNode *
87  syck_alloc_str()
88  syck_alloc_seq()
89  syck_alloc_str()
90
91    Allocates a node of a given type and initializes its
92    internal union to emptiness.  When left as-is, these
93    nodes operate as a valid empty string, empty sequence
94    and empty map.
95
96    Remember that the node's id (SYMID) isn't set by the
97    allocation functions OR any other node functions herein.
98    It's up to your handler function to do that.
99
100  void
101  syck_free_node( SyckNode *n )
102
103    While the Syck parser will free nodes it creates, use
104    this to free your own nodes.  This function will free
105    all of its internals, its type_id and its anchor.  If
106    you don't need those members free, please be sure they
107    are set to NULL.
108
109  SyckNode *
110  syck_new_str( char *str, enum scalar_style style )
111  syck_new_str2( char *str, long len, enum scalar_style style )
112
113    Creates scalar nodes from C strings.  The first function
114    will call strlen() to determine length.
115
116  void
117  syck_replace_str( SyckNode *n, char *str, enum scalar_style style )
118  syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style )
119
120    Replaces the string content of a node `n', while keeping
121    the node's type_id, anchor and id.
122
123  char *
124  syck_str_read( SyckNode *n )
125
126    Returns a pointer to the null-terminated string inside scalar node
127    `n'.  Normally, you might just want to use:
128
129      char *ptr = n->data.str->ptr
130      long len = n->data.str->len
131
132  SyckNode *
133  syck_new_map( SYMID key, SYMID value )
134
135    Allocates a new map with an initial pair of nodes.
136
137  void
138  syck_map_empty( SyckNode *n )
139
140    Empties the set of pairs for a mapping node.
141
142  void
143  syck_map_add( SyckNode *n, SYMID key, SYMID value )
144
145    Pushes a key-value pair on the mapping.  While the ordering
146    of pairs DOES affect the ordering of pairs on output, loaded
147    nodes are deliberately out of order (since YAML mappings do
148    not preserve ordering.)
149
150    See YAML's builtin !omap type for ordering in mapping nodes.
151
152  SYMID
153  syck_map_read( SyckNode *n, enum map_part, long index )
154
155    Loads a specific key or value from position `index' within
156    a mapping node.  Great for iteration:
157
158      for ( i = 0; i < syck_map_count( n ); i++ ) {
159        SYMID key = sym_map_read( n, map_key, i );
160        SYMID val = sym_map_read( n, map_value, i );
161      }
162
163  void
164  syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id )
165
166    Replaces a specific key or value at position `index' within
167    a mapping node.  Useful for replacement only, will not allocate
168    more room when assigned beyond the end of the pair list.
169
170  long
171  syck_map_count( SyckNode *n )
172
173    Returns a count of the pairs contained by the mapping node.
174
175  void
176  syck_map_update( SyckNode *n, SyckNode *n2 )
177
178    Combines all pairs from mapping node `n2' into mapping node
179    `n'.
180
181  SyckNode *
182  syck_new_seq( SYMID val )
183
184    Allocates a new seq with an entry `val'.
185
186  void
187  syck_seq_empty( SyckNode *n )
188
189    Empties a sequence node `n'.
190
191  void
192  syck_seq_add( SyckNode *n, SYMID val )
193
194    Pushes a new item `val' onto the end of the sequence.
195
196  void
197  syck_seq_assign( SyckNode *n, long index, SYMID val )
198
199    Replaces the item at position `index' in the sequence
200    node with item `val'.  Useful for replacement only, will not allocate
201    more room when assigned beyond the end of the pair list.
202
203  SYMID
204  syck_seq_read( SyckNode *n, long index )
205
206    Reads the item at position `index' in the sequence node.
207    Again, for iteration:
208
209      for ( i = 0; i < syck_seq_count( n ); i++ ) {
210        SYMID val = sym_seq_read( n, i );
211      }
212
213  long
214  syck_seq_count( SyckNode *n )
215
216    Returns a count of items contained by sequence node `n'.
217
218== YAML Parser ==
219
220Syck's YAML parser is extremely simple.  After setting up a
221SyckParser struct, along with callback functions for loading
222node data, use syck_parse() to start reading data.  Since
223syck_parse() only reads single documents, the stream can be
224managed by calling syck_parse() repeatedly for an IO source.
225
226The parser has four callbacks: one for reading from the IO
227source, one for handling errors that show up, one for
228handling nodes as they come in, one for handling bad
229anchors in the document.  Nodes are loaded in the order they
230appear in the YAML document, however nested nodes are loaded
231before their parent.
232
233=== How to Write a Node Handler ===
234
235Inside the node handler, the normal process should be:
236
2371. Convert the SyckNode data to a structure meaningful
238   to your application.
239
2402. Check for the bad anchor caveat described in the
241   next section.
242
2433. Add the new structure to the symbol table attached
244   to the parser.  Found at parser->syms.
245
2464. Return the SYMID reserved in the symbol table.
247
248=== Nodes and Memory Allocation ===
249
250One thing about SyckNodes passed into your handler:
251Syck WILL free the node once your handler is done with it.
252The node is temporary.  So, if you plan on keeping a node
253around, you'll need to make yourself a new copy.
254
255And you'll probably need to reassign all the items
256in a sequence and pairs in a map.  You can do this
257with syck_seq_assign() and syck_map_assign().  But, before
258you do that, you might consider using your own node structure
259that fits your application better.
260
261=== A Note About Anchors in Parsing ===
262
263YAML anchors can be recursive.  This means deeper alias nodes
264can be loaded before the anchor.  This is the trickiest part
265of the loading process.
266
267Assuming this YAML document:
268
269  --- &a [*a]
270
271The loading process is:
272
2731. Load alias *a by calling parser->bad_anchor_handler, which
274   reserves a SYMID in the symbol table.
275
2762. The `a' anchor is added to Syck's own anchor table,
277   referencing the SYMID above.
278
2793. When the anchor &a is found, the SyckNode created is
280   given the SYMID of the bad anchor node above.  (Usually
281   nodes created at this stage have the `id' blank.)
282
2834. The parser->handler function is called with that node.
284   Check for node->id in the handler and overwrite the
285   bad anchor node with the new node.
286
287=== Parser API ===
288
289 See <syck.h> for layouts of SyckParser and SyckNode.
290
291 SyckParser *
292 syck_new_parser()
293
294  Creates a new Syck parser.
295
296 void
297 syck_free_parser( SyckParser *p )
298
299  Frees the parser, as well as associated symbol tables
300  and buffers.
301
302 void
303 syck_parser_implicit_typing( SyckParser *p, int on )
304
305  Toggles implicit typing of builtin YAML types.  If
306  this is passed a zero, YAML builtin types will be
307  ignored (!int, !float, etc.)  The default is 1.
308
309 void
310 syck_parser_taguri_expansion( SyckParser *p, int on )
311
312  Toggles expansion of types in full taguri.  This
313  defaults to 1 and is recommended to stay as 1.
314  Turning this off removes a layer of abstraction
315  that will cause incompatibilities between YAML
316  documents of differing versions.
317
318 void
319 syck_parser_handler( SyckParser *p, SyckNodeHandler h )
320
321  Assign a callback function as a node handler.  The
322  SyckNodeHandler signature looks like this:
323
324    SYMID node_handler( SyckParser *p, SyckNode *n )
325
326 void
327 syck_parser_error_handler( SyckParser *p, SyckErrorHandler h )
328
329  Assign a callback function as an error handler.  The
330  SyckErrorHandler signature looks like this:
331
332   void error_handler( SyckParser *p, char *str )
333
334 void
335 syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h )
336
337  Assign a callback function as a bad anchor handler.
338  The SyckBadAnchorHandler signature looks like this:
339
340   SyckNode *bad_anchor_handler( SyckParser *p, char *anchor )
341
342 void
343 syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r )
344
345   Assigns a FILE pointer as an IO source and a callback function
346   which handles buffering of that IO source.
347
348   The SyckIoFileRead signature looks like this:
349
350     long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip );
351
352   Syck comes with a default FILE handler named `syck_io_file_read'.  You
353   can assign this default handler explicitly or by simply passing in NULL
354   as the `r' parameter.
355
356  void
357  syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r )
358
359    Assigns a string as the IO source with a callback function `r'
360    which handles buffering of the string.
361
362    The SyckIoStrRead signature looks like this:
363
364      long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip );
365
366   Syck comes with a default string handler named `syck_io_str_read'.  You
367   can assign this default handler explicitly or by simply passing in NULL
368   as the `r' parameter.
369
370  void
371  syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r )
372
373    Same as the above, but uses strlen() to determine string size.
374
375 
376  SYMID
377  syck_parse( SyckParser *p )
378
379    Parses a single document from the YAML stream, returning the SYMID for
380    the root node.
381
382== YAML Emitter ==
383
384Since the YAML 0.50 release, Syck has featured a new emitter API.  The idea
385here is to let Syck figure out shortcuts that will clean up output, detect
386builtin YAML types and -- especially -- determine the best way to format
387outgoing strings.
388
389The trick with the emitter is to learn its functions and let it do its
390job.  If you don't like the formatting Syck is producing, please get in
391contact the author and pitch your ideas!!
392
393Like the YAML parser, the emitter has a couple of callbacks: namely,
394one for IO output and one for handling nodes.  Nodes aren't necessarily
395SyckNodes.  Since we're ultimately worried about creating a string, SyckNodes
396become sort of unnecessary.
397
398=== The Emitter Process ===
399
4001. Traverse the structure you will be emitting, registering all nodes
401   with the emitter using syck_emitter_mark_node().  This step will
402   determine anchors and aliases in advance.
403
4042. Call syck_emit() to begin emitting the root node.
405
4063. Within your emitter handler, use the syck_emit_* convenience methods
407   to build the document.
408
4094. Call syck_emit_flush() to end the document and push the remaining
410   document to the IO stream.  Or continue to add documents to the output
411   stream with syck_emit().
412
413=== Emitter API ===
414
415 See <syck.h> for the layout of SyckEmitter.
416
417 SyckEmitter *
418 syck_new_emitter()
419
420  Creates a new Syck emitter.
421
422 SYMID
423 syck_emitter_mark_node( SyckEmitter *e, st_data_t node )
424
425  Adds an outgoing node to the symbol table, allocating an anchor
426  for it if it has repeated in the document and scanning the type
427  tag for auto-shortcut.
428
429 void
430 syck_output_handler( SyckEmitter *e, SyckOutputHandler out )
431
432  Assigns a callback as the output handler.
433
434    void *out_handler( SyckEmitter *e, char * ptr, long len );
435
436  Receives the emitter object, pointer to the buffer and a count
437  of bytes which should be read from the buffer.
438
439 void
440 syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler
441
442
443 void
444 syck_free_emitter
Note: See TracBrowser for help on using the browser.