Zar points, useful or waste of energy

19 Pages
« First
←
3
4
5
6
7
→
Last »

You cannot start a new topic
You cannot reply to this topic

Zar points, useful or waste of energy New to the concept, does it help...

#81 inquiry

Group: Admin
Posts: 14,566
Joined: 2003-February-13
Gender:Male
Location:Amelia Island, FL
Interests:Bridge, what else?

Posted 2004-May-18, 12:37

Zar, on May 18 2004, 12:59 PM, said:

I don't know about these 0.08 and 0.32, WHAT they represent, HOW they have been calculated, WHAT boards have been used, ANY explanation of ANY kind. Just don't have time really to pay attention to any numbers out the air. I can tell you that in "another studdy" BUM got 0.01, RUM got 0.002, and Zar got 100,000.47. Does that help? :-)

I hope you realize that, like you, I have no idea what these number represent. In fact, I think I made it clear they are just numbers without supporting data. I only give them because I assume they are associated with Tysens challenge to you which he said you accepted. See first link in article you quote. For me, I am convinced, I am using Zar points now pretty much all the time, except when I am lazy.

Ben

--Ben--

#82 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-18, 12:47

*** Ben wrote: "I hope you realize that, like you, I have no idea what these number represent.
<

I do indeed - nor does anyone else actually.

>
In fact, I think I made it clear they are just numbers without supporting data. I only give them because I assume they are associated with Tysens challenge to you which he said you accepted. See first link in article you quote.
<

I DID see the link and I even quoted it.

1) it is to Mike's request;

2) it is for an evaluation that has neither 0.25 nor 0.75 numbers in it.

Have a look yourself.

I don't mind running all kinds of requests as long as they SOMEHOW relate to a method like Zar Points or the 5-3-1 evaluation etc. that stand a CHANCE for at-the-table usage (when people are not lazy :-)

Cheers, Ben:

ZAR

#83 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-18, 16:32

Zar, on May 18 2004, 01:47 PM, said:

If you guys actually read the post where I gave the data you would see what those numbers represent.

Quote

All Hands  
            ERROR  SCORE
HCP          1.23  -0.49
HCP+321      1.07   0.00
HCP+531      1.05   0.07
Zar          1.05   0.08
BUMRAP+321   1.03   0.14
BUMRAP+531   1.02   0.21
Binky        0.99   0.32

This compares different methods of evaluation looking at ~2.8 million hands. ERROR is the average number of tricks that you are off between your evaluator's prediction and actual tricks. This error can never reach zero even if you knew your partner's hand perfectly, since you still don't know the location of the opponents' cards. SCORE is an estimation of the number of IMP's you would gain (or lose) per hand using this evaluation against a team using an unimaginative HCP+321

The explanation for what those numbers are is right there, guys!

And since you guys are questioning the fact that these are just numbers without supporting data, I've posted a spreadsheet which shows the hands and my generation methods. It's an excel sheet that has a random sample of about 13,000 hands, but the data shows the same thing as I get with millions of hands (that file would be too large).

I'm even making this be the sample of hands that eliminates NT hands and always makes 9+ tricks (the sample that makes Zar look the best by comparison).

The spreadsheet is in the files section of this yahoo group (I don't have any other way of posting it):

http://games.groups....oup/bridgeeval/

Unfortunately you have to join the yahoo group in order to get access to the file, but it's there for all to see.

Tysen

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#84 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-18, 19:45

Hi, guys:

I ran the Game boards for the 5-3-1, 3-2-1, WTC and Zar and here are the results. The first set of 18,000 boards are Games of 4H/4S with up to 21 HCP, while in the second set of 63,000 boards there are no restrictions.

For HCP of up to 21, the results are:

================Overall Results ============================
Out of 18030 contracts:

GOREN 3-2-1 ( HCP+3-2-1> 26 ) got 700 contracts
The WTC ( number of tricks > 9) got 2625 contracts
GOREN 5-3-1 ( HCP+5-3-1> 26 ) got 4641 contracts
Basic Zar Points ( no fit ) >51 got 8772 contracts
Fit Zar Points (+3 extra trmp) >51got 13819 contracts

For HCP of any kind, the results are:

================Overall Results ============================
Out of 63056 contracts:

The WTC ( number of tricks > 9) got 19666 contracts
GOREN 3-2-1 ( HCP+3-2-1> 26) got 32688 contracts
GOREN 5-3-1 ( HCP+5-3-1> 26 ) got 41045 contracts
Basic Zar Points ( no fit ) >51 got 49794 contracts
Fit Zar Points (+3 extra trmp) >51got 55802 contracts

Again, when you go lower on the Level (4 in this case) and if you put no HCP restrictions (like in the second case) the performance gets closer. So, in the 21-HCP set the Zar Points performance is 3 times better than the 5-3-1 method, while when there are no HCP restrictions it is roughly 56K vs. 41K out of 63K total.

Certainly, I'll post all borads on the website along with the analysis for each of them at the "Download" section of the site.

Cheers:

ZAR

#85 inquiry

Group: Admin
Posts: 14,566
Joined: 2003-February-13
Gender:Male
Location:Amelia Island, FL
Interests:Bridge, what else?

Posted 2004-May-18, 21:51

Hi Tysen,

First, your three evaluation post in rgb have not exactly spurred a rush of discussion of what you are comparaing. If I had to guess, I would think the lack of discussion about your methods is due to the fact that few people can determine what the heck you are evaluating, despite the fact the you have written what amounts to a 20 pages single space “describing your evaluation method” and said you looked at 2.8 million hands.

I have downloaded your “data” which contains 13094 hands. The data looks accurate as far as it goes, but we were not privy to the “data” until you posted the link here (or we simply missed it). Just as you find fault with ZAR’s approach (and to mine which is not based upon such exhaustive analysis), I think I have found, for me, the flaw in your analysis. You simply count points and say that should be the level.

Zar has convinced the world that a cue-bidding or blackwood is very useful to make sure you are not off two quick tricks for small slam or one quick trick for grand slam. This seems so obvious, that of course I apply this metric to any slam/grand slam evaluation process. And, this is important, ZAR points is very aggressive so such checks are more important playing Zar points than most (all?) other systems. You don’t do this in your analysis, so let me show you how this will skew your results, at least as far as a pragmatic approach is concerned using your data. To keep it simple for me, I took the the 54 NS hands that ZAR points would say bid grand slam, but that went down. Of these 54, 29 of were off a cashable ACE. Clearly, I would never consider this a “failure”, because I believe in using blackwood/cue-bidding. Of the 14 hands that have ZAR count for seven, but can’t make but 5 tricks, three were off two quick tricks, and even small slam would not be bid.

It is also interesting to note the type of hand where ZAR count is very high, no quick loser exist, but slam can’t make. They are hands like the following (four of the five NS hands with ZAR count 67 or higher that are not off even one quick trick, but which can’t make slam. Note the characteristic horrible fit. On the first one, West has 6 spades, east a spade void, and East 6 hearts, West a singleton.

Scoring: IMP

On the second one, East six clubs, west a void. West 5-5 in majors,
East 2-2.

Scoring: IMP

Board three East a void in spades, west five, East 8 clubs, West void. West five diamonds, East singleton.

Scoring: IMP

Board 4, West void in diamonds, East has 8 diamonds, East void in hearts, west five. East singleton club, west six clubs.

Scoring: IMP

Similar misfits exist on a lot of the other hands that Zar points (and probably other evaluation systems) overbids. I think any reasonable computer characterization system should apply the “reasonability test” of not bidding grand slams off a cashable ACE or two regardless of the point total. I am willing to live with the overbidding on misfits if you must, but even there, I think ZAR fit (and non-fit subtractions) are taken. For instance when playing ZAR points, you subtract 3 points for every trump you are short in partner’s suit (see Eric's and Zar's discussion). In theory, a void could easily lead to subtracting 9 points. So while you call hand 4 above worth 73 ZAR points, how many points does it have (counting fit points). In diamonds, west loses 9 point, dropping from 73 to 62. It is interesting to speculate if EAST’s hand should be devaluated for lack of fit also, he would be minus 9 points for heart fit or 6 points for ♣ fit. I am not exactly sure how such secondary suit subtractions should (would?) work. But if so, that would drop the count even further, to 56, suggesting a final contract of 5, not seven.

Similar minus fit (and plus fit) adjustments need to be made on all hands, what ZAR call his FIT-plus points. I am not terribly surprised that going on sheer ZAR points without a reality check for cashable quick tricks, much less degree of fit, leads to overbidding at slam/grand slam level. This is why, for instance, in my review of the Cavendish thread, I stated up front that reality check would be in effect (not off two aces, not off two quick tricks in any one suit).

But thanks for posting these hands, they will give me some to evaluate using the plus/minus adjustments (and since you have the normal zar points there, I can start from that). To bad you couldn't post more hands, it is fun looking at them, and the analysis of how many tricks can be made in each denomination.

Ben

--Ben--

#86 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-19, 10:31

Hi Inquiry,

Yes I agree with your reasoning about grands off aces, etc. And I'm not saying that anyone would bid based on points only. However, all the evaluators are treated equally and would all select a grand off an ace.

I was curious and you can do this yourself on the published data. Limit the hands to only those that can take 9-11 tricks (11702 of the hands). If you run the analysis on those hands only you get an average error in the number of tricks:

Zar off by 0.641 tricks
HCP+531 off by 0.640 tricks
BUM+531 off by 0.627 tricks

I also agree with what you said about misfitting hands. But again all the evaluators are treated the same. You can add in factors to adjust for fit for Zar, but you'll adjust for the others as well and get the same result.

Tysen

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#87 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-19, 10:38

Zar, on May 18 2004, 08:45 PM, said:

Out of   63056 contracts:

The WTC ( number of tricks > 9) got 19666 contracts
GOREN 3-2-1 ( HCP+3-2-1> 26) got   32688 contracts
GOREN 5-3-1 ( HCP+5-3-1> 26 ) got   41045 contracts
Basic Zar Points ( no fit ) >51   got   49794 contracts
Fit Zar Points (+3 extra trmp) >51got   55802 contracts

Again I have no idea what this proves.

When you're finding contracts that make game you say the criteria is >51. Is that really >51 or is it 52-61 (not the hands that would bid slam). When you do it for small slams are you doing it for >61 or only 62-66?

If you're doing it on >51 then all my evaluator has to do is bid a grand every time and I'll score perfect on all your methods.

[a few edits made]

Tysen

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#88 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-19, 10:44

***Ben wrote: “On both of these hands “pure” zar points total 67. Ok, Blackwood will keep you out of seven (missing ♥ ace), but what keeps you out of six?
<

When no explicit fit is found in the first 2 rounds of bidding, you drop one level off as even the Zar Bid Machine does actually. This is common for all the three methods supported there – Goren, Bergen, and Zar. Check it out. And in general, the lack of explicit fit in an early stage is a red flag in any system, natural and artificial alike.

Same with the Blackwood – playing Zar Points doesn’t disengage your brain from your mouth :-)

>
Well, most people, from all too bitter experience, underbid with misfits. And both of these qualify as misfits.
<

My points exactly – and again, look at how Zar Bid Machine (on the home page of the website) deducts 5 points for the lack of explicit fit and one level for all Goren, Bergen, and Zar.

>
But can misfits be “quantified” using some metric, sort of a negative fit point scale?
<

I had a similar discussion in another thread – misfit is a relative term, because a misfit in the major (say, the opening) suit COMBINED with a FIT in a secondary suit may be VERY powerful. And a double misfit is ... just the opposite of double-superfit :-)

ZAR

#89 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-19, 11:21

Zar, on May 18 2004, 08:45 PM, said:

Out of 63056 contracts:

The WTC ( number of tricks > 9) got 19666 contracts
GOREN 3-2-1 ( HCP+3-2-1> 26) got 32688 contracts
GOREN 5-3-1 ( HCP+5-3-1> 26 ) got 41045 contracts
Basic Zar Points ( no fit ) >51 got 49794 contracts
Fit Zar Points (+3 extra trmp) >51got 55802 contracts

Wait just a minute here. Am I reading this right where you only have the Goren hands bid games if the total is ">26" (that is 27, 28, etc.)?!? You list Zar as bidding game as >51 so I can only assume this is the case.

I don't know anyone that requires 27 points for game. I often bid game on 25, so maybe this needs some looking into.

Tysen

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#90 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-19, 11:29

*** tysen2k wrote: "Wait just a minute here. Am I reading this right where you only have the Goren hands bid games if the total is ">26" (that is 27, 28, etc.)?!? You list Zar as bidding game as >51 so I can only assume this is the case.

I don't know anyone that requires 27 points for game. I often bid game on 25, so maybe this needs some looking into.
<

That's NOT the HCP but the combo with the distributional points.

If you often bid Games on 25 HCP, this means you are moving towards aggresive games bidding - welcome to the club :-)

ZAR

#91 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-19, 12:07

Zar, on May 19 2004, 12:29 PM, said:

That's NOT the HCP but the combo with the distributional points.

If you often bid Games on 25 HCP, this means you are moving towards aggresive games bidding - welcome to the club :-)

I often bid games with 25 including distribution.

I'm just saying that maybe you should back down on your point requirements and see how that changes things in your study. We're trying to study methods of evaluation not simply aggressiveness.

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#92 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-19, 13:18

***tysen2k wrote: "I often bid games with 25 including distribution. I'm just saying that maybe you should back down on your point requirements and see how that changes things in your study. We're trying to study methods of evaluation not simply aggressiveness.
<

I found a book of the great Charles Goren in my library and I am going quote word-by-word:

"Where the partnership totals the equivalent of 26 points - two opening bids - game is ATTAINABLE if a FIT is found. If the prtnership totals 33 points, you have a chance for a slam... 37 points will normally produce a grand slam".

This also kind-of addresses your set of hands with double-misfit. Anyway, Goren is not the subject in these series of experiments and I can certainly drop it to 25 or even below for that matter - let me know.

Zar

#93 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-19, 13:39

Actually, this discussion led me to a very good idea.

I'll start dropping the boundaries of both 5-3-1 and 3-2-1 down,
untill they reach the level of Zar Points.

Then I'll do the same for the GRAND slam hands.

This will give us an indication of how aggresive the methods are indeed. CLEARLY it would NOT mean that if you start bidding games any time you have 21 Goren points (IF that turns out to be the "equilibrium") you will have the precision of the Zar Points bidding, though :-) I hope you realize that (if not, I'd love to meet you on a high-stake rubber bridge game :-)

For the sake of these experiments I will use ONLY the standard GIB boards (which are part of my database) rather than generatiing dymanic hands. I guess it will be very indicative.

Cheers:

ZAR

#94 hrothgar

Group: Advanced Members
Posts: 15,723
Joined: 2003-February-13
Gender:Male
Location:Natick, MA
Interests:Travel
Cooking
Brewing
Hiking

Posted 2004-May-19, 13:57

>I'll start dropping the boundaries of both 5-3-1 and 3-2-1 down,
>untill they reach the level of Zar Points.
>Then I'll do the same for the GRAND slam hands.
>This will give us an indication of how aggresive the methods are indeed.

What the #$%)_#@)_$ are you people talking about?
This entire methodology strikes me as ass-backwards...

From my perspective, the "right" way to go about things is as follows:

Start by normalizing the different evaluation criteria to use the same scale.
I'm fond of 0-11. I'm perfectly happy with 0-40, or 0-1, or whatever...

Next, take the database of board that GIB has analyzed.

1. For each board use a double dummy engine to determine how many tricks that a given side can take in a trump contract.

2. Sort the hands into buckets, based on number of tricks.

Bucket 1 = pairs of hands that will take 13 tricks
Bucket 2 = pairs of hands that will take 12 tricks
Bucket 3 = pairs of hands that will take 11 tricks
...

Calculate the strength of the pair of hands using whatever metrics you want (Zar points, Binkey points, BUM-RAP, etc).

3. For each bucket and each methodolgy, calculate the standard deviation...
Smallest standard deviation wins...

Please note: I couldn't care less how "aggressive" a hand valuation system is.
I am very interested in how accurate that system is.

To me, than translates to a known mean associated with a low standard deviation...

Alderaan delenda est

#95 tysen2k

Group: Full Members
Posts: 406
Joined: 2004-March-25

Posted 2004-May-19, 14:10

hrothgar, on May 19 2004, 02:57 PM, said:

What the #$%)_#@)_$ are you people talking about?
This entire methodology strikes me as ass-backwards...

Exactly. Which is what I've been trying to say for a long time.

Simply looking at hands that make grands and then seeing how many of those hands have at least x points is meaningless.

The methodology you describe is exactly what I've done in my experiments. What I've called error is the standard deviation from the expected value.

A bit of blatant self-pimping - I've got a new poker book that's getting good reviews.

#96 hrothgar

Group: Advanced Members
Posts: 15,723
Joined: 2003-February-13
Gender:Male
Location:Natick, MA
Interests:Travel
Cooking
Brewing
Hiking

Posted 2004-May-19, 14:29

Tysen

For what its worth, I did read your original postings on Rec.Games.Bridge

One question about the studies:

Does the relative accuracy of different method's vary based on the level of the contract? I recall that you calculated errors for each bidding system, but I don't recall any matrix plotting the error for hands that made precisely 10 tricks...

Alderaan delenda est

#97 inquiry

Group: Admin
Posts: 14,566
Joined: 2003-February-13
Gender:Male
Location:Amelia Island, FL
Interests:Bridge, what else?

Posted 2004-May-19, 14:38

hrothgar, on May 19 2004, 02:57 PM, said:

To me, than translates to a known mean associated with a low standard deviation...

Interesting to see Tysen agree with this... but if you want to go that way, invent a system where ZAR points are 0.1 for a jack, 0.2 for a queen, 0.4 for king, or maybe better, 0.01 for a jack, 0.02 for a queen....

The standard deviation will be very low indeed.

If you take the 229 hands in tysen's data base that can make grand slam for NS, and calculate the mean and standard deviation for ZAR and BUM, you will find that

ZAR is 63.43 (+/-) 3.44, (no counting honors in trump suit, so average would be more than two points higher

BUM is 34.19 (+/-) 2.496

So is Bum rap more accurate because it has a smaller SD? Nope. The smaller SD is becasue it uses smaller numbers. If we normalize the point count for ZAR and for BUM rap for the highest point count that makes (and call that 1.00), the range of counts from ZAR runs from 1.0 down to 0.73. The range for Bum runs from 1.0 down to 0.67. And yes, the BUM rap method has a larger SD

ZAR (+/-) 0.048
BUM (+/-) 0.067

You will need a better metric I think.

--Ben--

#98 Zar

Group: Full Members
Posts: 153
Joined: 2004-April-03

Posted 2004-May-19, 14:42

*** tysen2k wrote: "Simply looking at hands that make grands and then seeing how many of those hands have at least x points is meaningless.
<

Actually this is EXACTLY what hrothgar suggests and exactly what I AM doind actually, all-the-way to using ONLY the Standard GIB boards rather than pulling out of hair bards with double and tripple misfits.

So ... can we get this specified clearly? I hav the GIB boards (all of them) flagged in the DB and I am doing exactly what is suggested by hrothgar, bur tysen2k intefered again spreading the fog of uncertainty :-)

The ONLY thing different is that I simply count the boards that the method "flags" as approriate to play at THE discussed level. Let me know:

ZAR

#99 hrothgar

Group: Advanced Members
Posts: 15,723
Joined: 2003-February-13
Gender:Male
Location:Natick, MA
Interests:Travel
Cooking
Brewing
Hiking

Posted 2004-May-19, 14:48

inquiry, on May 19 2004, 11:38 PM, said:

hrothgar, on May 19 2004, 02:57 PM, said:

To me, than translates to a known mean associated with a low standard deviation...

Did you notice the section of my post that states:

>Start by normalizing the different evaluation criteria to use the same scale.
>I'm fond of 0-11. I'm perfectly happy with 0-40, or 0-1, or whatever...

Alderaan delenda est

#100 hrothgar

Group: Advanced Members
Posts: 15,723
Joined: 2003-February-13
Gender:Male
Location:Natick, MA
Interests:Travel
Cooking
Brewing
Hiking

Posted 2004-May-19, 14:57

inquiry, on May 19 2004, 11:38 PM, said:

If we normalize the point count for ZAR and for BUM rap for the highest point count that makes (and call that 1.00), the range of counts from ZAR runs from 1.0 down to 0.73. The range for Bum runs from 1.0 down to 0.67. And yes, the BUM rap method has a larger SD

ZAR (+/-) 0.048
BUM (+/-) 0.067

You will need a better metric I think.

Uh...

I think that you and I have a different definition of the world "normalize".
Fixing one point on an axis does not normalize the distribution.

More specifically, saying that Zar points are distributed along (.73, 1.0) while
BUM_RAP points are distributed along (.67, 1.0) is does not normalize the distribution in any normal sense of the word...

In order to normalize the distribution, both evaluation schemes need to be distributed along the same interval. If you chose the Interval ([0,1], then the weakest possible pair of hand is a zero, the strongest possible pair of hands is a 1...

Alderaan delenda est

19 Pages
« First
←
3
4
5
6
7
→
Last »

You cannot start a new topic
You cannot reply to this topic

BBO Discussion Forums: Zar points, useful or waste of energy - BBO Discussion Forums

Zar points, useful or waste of energy New to the concept, does it help...

#81 inquiry

#82 Zar

#83 tysen2k

#84 Zar

#85 inquiry

#86 tysen2k

#87 tysen2k

#88 Zar

#89 tysen2k

#90 Zar

#91 tysen2k

#92 Zar

#93 Zar

#94 hrothgar

#95 tysen2k

#96 hrothgar

#97 inquiry

#98 Zar

#99 hrothgar

#100 hrothgar

4 User(s) are reading this topic
0 members, 4 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

BBO Discussion Forums: Zar points, useful or waste of energy - BBO Discussion Forums

Zar points, useful or waste of energy New to the concept, does it help...

4 User(s) are reading this topic 0 members, 4 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

4 User(s) are reading this topic
0 members, 4 guests, 0 anonymous users