asp.net - whats wrong with my XML for document ranking? -


i wrote program in c# calculate tf-idf rank documents.

i used following xml store word frequencies within documents. criticised heavily using structure. though use text of word within tag, per me efficient , consumes less space. also, can make search using xdocument pretty since nice tree structure. can me understand why criticised heavily?

criticism: how can add information within meta-data? (for me innovative).

<word>    <siddhartha>       <doc1> 4 </doc4>       <doc2> 5 </doc2>     <insipration>       <doc1> 4 </doc1>       <doc6> 5 </doc6>     .... </word> 

i suggested this:

   <word>    <text> siddhartha </text>    <doc1> 4 </doc1>    <text> inspiration </text>    <doc1> 4 </doc1>    ...    </word> 

your structure, word name node, hard parse generic parsers. there no defined structure: need read whole document know it.

i may have done (i tried stay closed idea):

<words>    <word id="siddhartha">       <freq id="doc1"> 4 </freq>       <freq id="doc2"> 5 </freq>    </word>    .... </words> 

Comments

Popular posts from this blog

delphi - How to convert bitmaps to video? -

jasper reports - Fixed header in Excel using JasperReports -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -