Data level
Data can come from many sources. The program has to open a file, a database connection, a serial port, a network port, or other device and begin reading in a stream of data. At this level the data is an almost meaningless stream of single bytes.Format level
These bytes are organized in a specific pattern known as a format. There are many different formats that the data can be organized around.Fixed length. Each field can be in a strict order, each with a fixed length, so that each record you read will be the sum of those fixed fields. Typically there will be a special byte with an end of line significance, typically a newline or a carriage return, but with this format the record separator is optional. This is how the IP and TCP headers come in a data packet at layers 4 and 5. Each byte and even each bit can have a specific positional meaning. If you look at a set of data and you can set it to be 80 columns wide in a text editor and suddenly you see beginning of last names all line up right down the page at column 20, each name followed by spaces until another column starts all lined up at column 32.
Delimited. Each field is followed by a delimiter, typically a comma or a tab and there is an end of record marker that separates each record from each other, typically a newline or a carriage return as with some of the fixed length format above. Typically the fields do still have a maximum length, or range of values, but this is not visible from the format itself. Typically you can spot this format by seeing the commas or the tabs in the data, typically every record will have a fixed count of commas or tabs in every record.
Mixed. A message, or record, can be a combination of the above. The fields can mostly be delimited with commas or tabs, but have a few fields whose contents have a fixed. HL7 is an example of a mixed format.
Grammar. This used to be much more difficult than it is now. Typically this means they used XML now. In the past people would create many different formats for data that was contextual in nature. If you are trying to parse text that comes from a command line, or a language like English, or a program file written in C or Java, then your parser will have to understand that combination of positional text whose meaning is determined by the initial state and the order of the commands.
No comments:
Post a Comment