asp.net - whats wrong with my XML for document ranking? -
i wrote program in c# calculate tf-idf rank documents.
i used following xml store word frequencies within documents. criticised heavily using structure. though use text of word within tag, per me efficient , consumes less space. also, can make search using xdocument pretty since nice tree structure. can me understand why criticised heavily?
criticism: how can add information within meta-data? (for me innovative).
<word> <siddhartha> <doc1> 4 </doc4> <doc2> 5 </doc2> <insipration> <doc1> 4 </doc1> <doc6> 5 </doc6> .... </word>
i suggested this:
<word> <text> siddhartha </text> <doc1> 4 </doc1> <text> inspiration </text> <doc1> 4 </doc1> ... </word>
your structure, word name node, hard parse generic parsers. there no defined structure: need read whole document know it.
i may have done (i tried stay closed idea):
<words> <word id="siddhartha"> <freq id="doc1"> 4 </freq> <freq id="doc2"> 5 </freq> </word> .... </words>
Comments
Post a Comment