diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index 2b16c30903..f81a13f8cf 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -256,6 +256,8 @@ Release 2.8.0 - UNRELEASED IMPROVEMENTS + MAPREDUCE-579. Streaming "slowmatch" documentation. (harsh) + MAPREDUCE-6287. Deprecated methods in org.apache.hadoop.examples.Sort (Chao Zhang via harsh) diff --git a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm index b4c5e38c8e..7f2412e100 100644 --- a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm +++ b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm @@ -546,6 +546,13 @@ You can use the record reader StreamXmlRecordReader to process XML documents. Anything found between BEGIN\_STRING and END\_STRING would be treated as one record for map tasks. +The name-value properties that StreamXmlRecordReader understands are: + +* (strings) 'begin' - Characters marking beginning of record, and 'end' - Characters marking end of record. +* (boolean) 'slowmatch' - Toggle to look for begin and end characters, but within CDATA instead of regular tags. Defaults to false. +* (integer) 'lookahead' - Maximum lookahead bytes to sync CDATA when using 'slowmatch', should be larger than 'maxrec'. Defaults to 2*'maxrec'. +* (integer) 'maxrec' - Maximum record size to read between each match during 'slowmatch'. Defaults to 50000 bytes. + $H3 How do I update counters in streaming applications? A streaming process can use the stderr to emit counter information. `reporter:counter:,,` should be sent to stderr to update the counter.