Showing posts with label execution. Show all posts
Showing posts with label execution. Show all posts

Friday, March 30, 2012

Partial row at end of file - best way to handle?

Hi,

I have a file where there is a partial row at the end. It doesn't cause an error, but I get a "partial row" warning during execution.

What do most people do with these partial rows? Do they just ignore them as long as they don't cause errors? Or is it better to handle the partial row with a conditional split, for example?

Just wondering what other people's thoughts on this are. I tend to be of the "get rid of it" camp, but maybe that's overkill? Just looking for opinions, best practices.

Thanks

Hi Sadie,

I'm not sure I know what you mean. Can you paste the last few lines of the file up here? Also, tell us how you have the connection manager configured (i.e. delimited, fixed-length etc...)

-Jamie

|||

Data:

col1, col2, col3, col4, etc

col1, col2, col3, col4, etc

col1, col2, col3, col4, etc

col1, col2, col3, col4, etc

col1 --> partial row

This gives a warning, but doesn't cause an error.

Comma delimited conn mgr.

|||Is it always the last row, or can it be in the middle?|||

It's only the last row.

If you have partial rows in the middle of the file, this causes an error (unless you're ignoring errors). But SSIS ignores the partial row at the very end without any special error handling.

|||

sadie519590 wrote:

It's only the last row.

If you have partial rows in the middle of the file, this causes an error (unless you're ignoring errors). But SSIS ignores the partial row at the very end without any special error handling.

Then you can leave it. No harm, no foul. Why is the last row incomplete though? Can it be fixed at the source?|||

Can't be fixed at the source, these are automatically generated extracts that get automatically ftp'd and loaded.

Which goes back to my original question:

What is the best practice for handling these partial rows at the end of file?

If they're ignored, SSIS gives a warning, but no errors. So the package still runs.

But I can also add a conditional split to remove it, which is extra work, but ensures all junk rows are removed.

I was just wondering what other people do with these partial end of file rows?

|||Is the row a footer or something? What makes it "partial"?

I don't accept "bad" files for input into SSIS. Unless the mandate is given from above, I consider files like this to be bad and those who build it need to fix it.

With that said, it's up to you. I don't think there is a best practice. Use whatever you're comfortable with. If you log warnings, and get tired of sifting through them, then perhaps you want to graciously handle the bad rows.|||Yes, these are footers, not bad rows, per se.|||Not sure there is a best practice on this, as Phil said. If it is not causing a problem, I'd leave it.

Wednesday, March 21, 2012

Partitioning - Execution Plan

Hello group,
I'm considering creating a Partitioned view on a Sales table that
contains up to 20 Millions records. I'm doing some test and have a result
I'm questionning when looking at the Execution Plan generated.
First, let me explain a bit my structure. In my Sales table, I have
Ex-Factory sales information and Externally provided information. There
distinguised using an identifier (0=Ex Factory, 1=External).
I've created 8 Tables
SalesTSA1999 Check Constraint (Origin=1 And BilledDate
BETWEEN '19990101' And '19991231')
SalesTSA2000 Check Constraint (Origin=1 And BilledDate
BETWEEN '20000101' And '20001231')
SalesTSA2001 Check Constraint (Origin=1 And BilledDate
BETWEEN '20010101' And '20011231')
SalesTSA2002 Check Constraint (Origin=1 And BilledDate
BETWEEN '20020101' And '20021231')
SalesXFactory1999 Check Constraint (Origin=0 And BilledDate BETWEEN
'19990101' And '19991231')
SalesXFactory2000 Check Constraint (Origin=0 And BilledDate BETWEEN
'20000101' And '20001231')
SalesXFactory2001 Check Constraint (Origin=0 And BilledDate BETWEEN
'20010101' And '20011231')
SalesXFactory2002 Check Constraint (Origin=0 And BilledDate BETWEEN
'20020101' And '20021231')
and a view that include all theses tables using a UNION ALL:
viewSalesPartitioned
Then, I'm issuing the following Query:
Select Sum(Amount)
From viewSalesPartitioned
Where Origin=1 AND BilledDate BETWEEN '20020101' AND '20021231' --No worry
about Time, it's all set to 00:00:00
When looking at the Execution Plan in Query Analyzer I can see the execution
plan is "almost" getting the perfect query. I mean, on each table I can see
a "Filter Cost 0%" Except for SalesXFactory2002 where there's a table scan
with a cost of 6%. In my opinion it should do a filter like on the other
tables based on the check constraint on the column Origin. Maybe there's
something I don't understand properly (I'm just starting using the Execution
Plan analysis which is sometimes a bit confusing for me...)
Any help would be appreciated. Thanks!Christian Hamel wrote:
> Hello group,
> I'm considering creating a Partitioned view on a Sales table that
> contains up to 20 Millions records. I'm doing some test and have a
> result I'm questionning when looking at the Execution Plan generated.
> First, let me explain a bit my structure. In my Sales table, I have
> Ex-Factory sales information and Externally provided information.
> There distinguised using an identifier (0=Ex Factory, 1=External).
> I've created 8 Tables
> SalesTSA1999 Check Constraint (Origin=1 And BilledDate
> BETWEEN '19990101' And '19991231')
> SalesTSA2000 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20000101' And '20001231')
> SalesTSA2001 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20010101' And '20011231')
> SalesTSA2002 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20020101' And '20021231')
> SalesXFactory1999 Check Constraint (Origin=0 And BilledDate
> BETWEEN '19990101' And '19991231')
> SalesXFactory2000 Check Constraint (Origin=0 And BilledDate
> BETWEEN '20000101' And '20001231')
> SalesXFactory2001 Check Constraint (Origin=0 And BilledDate
> BETWEEN '20010101' And '20011231')
> SalesXFactory2002 Check Constraint (Origin=0 And BilledDate
> BETWEEN '20020101' And '20021231')
> and a view that include all theses tables using a UNION ALL:
> viewSalesPartitioned
> Then, I'm issuing the following Query:
> Select Sum(Amount)
> From viewSalesPartitioned
> Where Origin=1 AND BilledDate BETWEEN '20020101' AND '20021231' --No
> worry about Time, it's all set to 00:00:00
> When looking at the Execution Plan in Query Analyzer I can see the
> execution plan is "almost" getting the perfect query. I mean, on
> each table I can see a "Filter Cost 0%" Except for SalesXFactory2002
> where there's a table scan with a cost of 6%. In my opinion it
> should do a filter like on the other tables based on the check
> constraint on the column Origin. Maybe there's something I don't
> understand properly (I'm just starting using the Execution Plan
> analysis which is sometimes a bit confusing for me...) Any help would
> be appreciated. Thanks!
Indexing on a character or bit value that allows only two values will
normally not add any performance benefit because the column selectivity
is too low. You might be better off using the date as the patitioning
column without the Origin. You could also try adding a hint to force SQL
Server to use the clustered index.
I would also change the check constraints to avoid any possibility of a
time portion triggering an error:
Check Constraint (Origin=1 And BilledDate >= '20000101' And BilledDate <
'20010101')
David Gugick
Quest Software
www.imceda.com
www.quest.com|||"Christian Hamel" <chamel@.NoSpAmMeRaLlOwEd.CoM> wrote in message
news:O%23AEf7exFHA.3720@.TK2MSFTNGP11.phx.gbl...
> Hello group,
> I'm considering creating a Partitioned view on a Sales table that
> contains up to 20 Millions records. I'm doing some test and have a result
> I'm questionning when looking at the Execution Plan generated.
> First, let me explain a bit my structure. In my Sales table, I have
> Ex-Factory sales information and Externally provided information. There
> distinguised using an identifier (0=Ex Factory, 1=External).
> I've created 8 Tables
> SalesTSA1999 Check Constraint (Origin=1 And BilledDate
> BETWEEN '19990101' And '19991231')
> SalesTSA2000 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20000101' And '20001231')
> SalesTSA2001 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20010101' And '20011231')
> SalesTSA2002 Check Constraint (Origin=1 And BilledDate
> BETWEEN '20020101' And '20021231')
> SalesXFactory1999 Check Constraint (Origin=0 And BilledDate BETWEEN
> '19990101' And '19991231')
> SalesXFactory2000 Check Constraint (Origin=0 And BilledDate BETWEEN
> '20000101' And '20001231')
> SalesXFactory2001 Check Constraint (Origin=0 And BilledDate BETWEEN
> '20010101' And '20011231')
> SalesXFactory2002 Check Constraint (Origin=0 And BilledDate BETWEEN
> '20020101' And '20021231')
> and a view that include all theses tables using a UNION ALL:
> viewSalesPartitioned
> Then, I'm issuing the following Query:
> Select Sum(Amount)
> From viewSalesPartitioned
> Where Origin=1 AND BilledDate BETWEEN '20020101' AND '20021231' --No worry
> about Time, it's all set to 00:00:00
> When looking at the Execution Plan in Query Analyzer I can see the
> execution
> plan is "almost" getting the perfect query. I mean, on each table I can
> see
> a "Filter Cost 0%" Except for SalesXFactory2002 where there's a table scan
> with a cost of 6%. In my opinion it should do a filter like on the other
> tables based on the check constraint on the column Origin. Maybe there's
> something I don't understand properly (I'm just starting using the
> Execution
> Plan analysis which is sometimes a bit confusing for me...)
> Any help would be appreciated. Thanks!
>
Partitioned views only support a single partitioning column.
So the check constraints should be just on the date. And in the query plan
you shouldn't even see the other partitions. What you're currently seeing
is not true partition elimination. The query is just hitting each PK index
and quickly eliminating the partitions which don't contain the that column.
Partitioned views allow the data in a large table to be split into smaller
member tables. The data is partitioned between the member tables based on
ranges of data values in one of the columns. The data ranges for each member
table are defined in a CHECK constraint specified on the partitioning
column.
http://msdn2.microsoft.com/ms248875
If you add an index on Origin, or perhaps add Origin as well as BilledDate
to your primary key, you should be able to eliminate the table scan. You
will get partition elimination to aviod hitting the other time period, but
normal index s behavior to eliminate the other Origin.
David|||My TinyInt column is not part of the index.
I thought that since there was a Check constraint on the table that allow
only a specific value, SQL Server would be kind enough to don't scan the
records based on the constraint and the value specified in my query.
Thanks for the reply.
"David Gugick" <david.gugick-nospam@.quest.com> a crit dans le message de
news: %23NC0dEfxFHA.3236@.TK2MSFTNGP14.phx.gbl...
> Christian Hamel wrote:
> Indexing on a character or bit value that allows only two values will
> normally not add any performance benefit because the column selectivity is
> too low. You might be better off using the date as the patitioning column
> without the Origin. You could also try adding a hint to force SQL Server
> to use the clustered index.
> I would also change the check constraints to avoid any possibility of a
> time portion triggering an error:
> Check Constraint (Origin=1 And BilledDate >= '20000101' And BilledDate <
> '20010101')
>
> --
> David Gugick
> Quest Software
> www.imceda.com
> www.quest.com|||Ok, I see then. I'll check what I can do.
Thank you very much. I'll go to bed more intelligent tonight :)
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> a crit dans
le message de news: uzk3sGfxFHA.1032@.TK2MSFTNGP12.phx.gbl...
> "Christian Hamel" <chamel@.NoSpAmMeRaLlOwEd.CoM> wrote in message
> news:O%23AEf7exFHA.3720@.TK2MSFTNGP11.phx.gbl...
> Partitioned views only support a single partitioning column.
> So the check constraints should be just on the date. And in the query
> plan you shouldn't even see the other partitions. What you're currently
> seeing is not true partition elimination. The query is just hitting each
> PK index and quickly eliminating the partitions which don't contain the
> that column.
> Partitioned views allow the data in a large table to be split into smaller
> member tables. The data is partitioned between the member tables based on
> ranges of data values in one of the columns. The data ranges for each
> member table are defined in a CHECK constraint specified on the
> partitioning column.
> http://msdn2.microsoft.com/ms248875
> If you add an index on Origin, or perhaps add Origin as well as BilledDate
> to your primary key, you should be able to eliminate the table scan. You
> will get partition elimination to aviod hitting the other time period, but
> normal index s behavior to eliminate the other Origin.
> David
>
>

Tuesday, March 20, 2012

Parent - Multiple Child Package Execution

I have a parent package "A", I also have 4 child packages "B","B1","B2",B3"

In BIDS, I created a file connection in the package "A" to connect to the child packages. So whenever I want to run B1 or B2, I change the path in the file connection to point to B1 or B2 and so on. Since the developement environment is File system this works perfectly fine.

But in the Test / Production environment all packages are stored in the Sql server. How can I paramaterize the child package connection so that I can use configuration / variables to select which child package to execute.

Thanks

Set up four execute package tasks that point to each child package. Then use a variable precedence constraint to determine which execute package task to run. You could make the variable an integer and then use something like: @.ChildPackage == 1, @.ChildPackage == 2, etc...|||

One problem I see with this approach is that, If I develop another child package then I have to modify the master package to include this new child package and re-deploy. as the number of packages increases, it will be difficult to maintain. Basically I just want one execute package task which should be configurable ( say which package to execute or path of the package to execute)

Thanks

|||Could you use two Execute Package tasks, one configured for executing on the file system, the other for the database? Then use precedence constraints to control which one executes. So you'd pass in two variables - one being the connection to the package, the other setting a variable that controls which package executes.

Parent - Multiple Child Package Execution

I have a parent package "A", I also have 4 child packages "B","B1","B2",B3"

In BIDS, I created a file connection in the package "A" to connect to the child packages. So whenever I want to run B1 or B2, I change the path in the file connection to point to B1 or B2 and so on. Since the developement environment is File system this works perfectly fine.

But in the Test / Production environment all packages are stored in the Sql server. How can I paramaterize the child package connection so that I can use configuration / variables to select which child package to execute.

Thanks

Set up four execute package tasks that point to each child package. Then use a variable precedence constraint to determine which execute package task to run. You could make the variable an integer and then use something like: @.ChildPackage == 1, @.ChildPackage == 2, etc...|||

One problem I see with this approach is that, If I develop another child package then I have to modify the master package to include this new child package and re-deploy. as the number of packages increases, it will be difficult to maintain. Basically I just want one execute package task which should be configurable ( say which package to execute or path of the package to execute)

Thanks

|||Could you use two Execute Package tasks, one configured for executing on the file system, the other for the database? Then use precedence constraints to control which one executes. So you'd pass in two variables - one being the connection to the package, the other setting a variable that controls which package executes.

Monday, March 12, 2012

parameters show previous value on second execution of report

I'm running a crystal report from c#.net application, taking parameters from textboxes on the screen, using the parameters to do the data selection, and then printing the parameters in the heading of the report.

The first time I run the report, it displays properly, with the correct info in the heading. If I change the parameters and execute it again, the data on the report changes, but the heading still shows the original parameters that I entered.

Has anyone else encountered a problem like this? Or can you suggest a solution?

Thanks,
ChrisPost the code you are using for displaying the report.|||I solved this already by using a trick that fixes another crystal problem with the navigation end page button: I commented the code in the navigate method, and called my PopulateReport method from the page load event.

I don't know why this works, but it solved the problem.

Wednesday, March 7, 2012

Parameters Disabled with Snapshot Report Execution

Is there a way to schedule report snapshots and still give users the option to change the parameters? I found that report parameters are disabled when I set the execution options to use a snapshot. I fixed the problem by setting the report to use cached data that expires on a custom schedule and by scheduling an e-mail subscription that creates a new report cache soon after the cache expires. I'm guessing there must be a better way to provide good report performance with the default parameters as well as flexibility.

You'd need to change from using Query Parameters the query to using Report Filters.

You can read about it here:

http://msdn2.microsoft.com/en-us/library/aa255838(SQL.80).aspx

I would only recommend doing this for filtering relatively small amounts of data. The SSRS is not as efficient at filtering data as a stored proc on a DB server.